Saturday, July 29, 2017

Rubiks Cube Solver

Last Thanksgiving I started working on my CraneCuber robot.


By January it could solve 2x2x2, 3x3x3, 4x4x4 and 5x5x5 but I was unable to find solvers for anything larger. Only the solvers for 2x2x2 and 3x3x3 can run on something with as little horsepower as an EV3 (300Mhz with 64M of RAM). The 4x4x4 solver can run on a beaglebone black but it takes about 2 minutes to calculate the solution on that platform. The 5x5x5 solver (written by xyzzy, it is not publicly available) takes several gigs of RAM, I have to run it on my laptop and it takes about 50s to find a solution. The search for 6x6x6 and 7x7x7 solvers was a dead end. A few solvers have been written for these larger cubes but none of the code is open source and binaries have never been made available.

Goals

Building a robot that could solve up to a 5x5x5 was pretty cool and I am very appreciative of the solvers that I was able to find and use. The combination of not being able to run all solvers on the EV3 along with the lack of solvers for anything larger than 5x5x5 motivated me to write my own solver. My goals were:

  • open source...I hope that no one else has to reinvent this wheel again. It would be especially nice if I can get others to contribute to it because there are a lot of people out there who know far more than I do about solving Rubik's cubes
  • Run on something as small as an EV3
  • Produce a solution quickly....telling a room of 4th graders to chill for 2 minutes while the robot figures out the solution is a bit awkward
  • Produce a solution that takes some reasonable number of moves. A solver that gives you 400 moves to solve a 4x4x4 would seem kinda silly
  • Python Python Python

If you stop reading before you get to the end (which is very likely) but are interested in the code for the solver it is available at https://github.com/dwalton76/rubiks-cube-NxNxN-solver

Problems

I started writing a 4x4x4 solver by following this guide. I got it working but was disappointed that the solutions it produced where 200+ moves. I then moved on to solving 5x5x5 by following the same guide, I got it working but was again disappointed as my solutions were ~390 moves. The 4x4x4 solver that I used originally could typically solve a cube in about 50 moves and xyzzy's 5x5x5 solver took about 90 moves so my solutions were way way high in terms of move count. I was probably 4 to 6 weeks into this adventure before it dawned on me to actually talk to some people who have done this before and get some ideas on how one writes a solver that produces a solution with a super small number of moves. I was naive in thinking that an algorithm used by a human to solve a cube would do so in an impressively small number of moves.

I met two people online who have been a huge help. The first person was Hagin Onyango, Hagin introduced me to the concept of lookup tables.  What is a "lookup table" you ask? Let's walk through how to solve a 2x2x2 cube using a lookup table to nail down this key concept.

2x2x2 Lookup Table

Some basics

  • Rubik's cubes have 6 sides, they are labeled U L F R B D. 
  • Moves in a solution describe which side of the cube to turn and in which direction. U means turn side U one quarter turn clockwise. U' means turn side U one quarter turn counter clockwise. U2 means turn side U two quarter turns (direction does not matter). 
  • We'll assume a standard color scheme of white on U, green on L, red on F, blue on R, orange on B and yellow on D
  • The squares of each cube can be described via their color (red, green, blue, etc) or via the side that they belong to. I always put white on top (side U) so you can just label each white square as U, each yellow square as D, etc. It takes far fewer characters to describe the state of the cube using ULFRBD than it does using color names so from here on out we'll use the former. 
  • A cube can be described via a string of characters for each square. The string for a solved 2x2x2 would be UUUULLLLFFFFRRRRBBBBDDDD. 

Now pretend that you had a lot of time on your hands and about 3 million solved 2x2x2 cubes sitting around your house. You pick up the first cube and do a U turn which puts the cube in state UUUUFFLLRRFFBBRRLLBBDDDD. So now whenever you are walking down the street and find a cube that looks like UUUUFFLLRRFFBBRRLLBBDDDD you know that you can do a U' to put it back to solved...presto you have the first entry on your lookup table!! Now repeat this process for moves L, F, R, etc (starting with a solved cube each time) and whenever you find a cube state that isn't in your lookup table you add it along with the inverse of the steps you took to get the cube to that state...we inverse the steps so that you can get from the scrambled state back to the solved state.

Now we've done all single move sequences, time to move on to sequences that consist of two moves and then three moves and so on. Eventually we will have added all 3,674,160  possible states for a 2x2x2 cube, to our lookup table along with the steps to get from each state back to solved. In computer science terms we did a breadth first search to find all possible cube states.

That my friends is a very basic lookup table. A 2x2x2 solver can simply lookup the cube state in the lookup table to get the solution.

Binary Search 

There are 3.6 million entries in our 2x2x2 lookup table, if we were to compare the current cube state vs all 3.6 million entries until we found a match that would take a while. Worst case the entry we are looking for would be the very last one. This would be O(n) for all the computer science geeks reading this.

That sounds pretty crappy but alas there is a better way to find the entry we are looking for.  If we sort the entries in the lookup table then we can jump to middle of the list and compare that entry vs. the entry we are looking for. If the one we are looking for is "less" than the one in the middle of the file then we know that we need to be looking in the first half of the file...we just eliminated half the file in a single lookup. At this point we can cut the first half of the file in half and figure out if we should be looking in the first quarter of the file or the second quarter. We repeat this process until we find the entry we are looking for. This is called a Binary Search and it is very efficient, it is O(log n).

Many Rubik's cube solvers store their lookup tables in a compressed format in a file and load them into memory for fast searching. If you have plenty of memory this is fine but one of my goals was to run on an EV3 with 64M of RAM so I will be storing my looking tables in plain text files where the content is sorted so I can do a binary search. This doesn't take much memory at all because we are simply moving a filehandle around in a file and is fast enough for my needs.

3x3x3

Now we want to solve a 3x3x3 so we just do the same basic thing where we crank through increasingly longer sequences of moves until we have all cube states in our lookup table...that's all we need to do right?? WRONG!! Unfortunately there are 43,252,003,274,489,856,000 (43 quintillion) possible permutations that a 3x3x3 can be in so this is far too large for us to do via a single lookup table. Google threw 35 years of CPU time at solving all 43 quintillion states back in 2010.  If Larry Page wants to build a lookup table for 3x3x3 he has the resources to do so but the rest of us do not.  Google didn't actually build a lookup table (not to my knowledge at least), they just proved that all 43 quintillion 3x3x3 cube states could be solve in 20 moves or less.

Now the good news is the 3x3x3 was invented in 1974 and a lot of really smart people have spent the last 40 years figuring out how to solve it very efficiently. If you google "Rubik's cube kociemba solver" you will find all kinds of papers on how the best 3x3x3 solvers work.

Solving the 3x3x3 is a problem that has been solved and there are open source solvers out there, I use this one.

4x4x4

For all cubes larger than a 3x3x3 we will solve them by "reducing" them down to a 3x3x3 by

  • solving all of the centers
  • pairing all of the edges

At this point in the journey I felt like I understood lookup tables fairly well so I started building a lookup table that would solve the centers of a 4x4x4. I let my lookup table builder code run for a few days but it still wasn't finished and didn't appear to be anywhere close to finished (it was still adding new entries like crazy). I started wondering just how large of a lookup table this would be. Remember earlier I said that I met two people who were a huge help in teaching me how folks typically write solvers? This is where the second person, xyzzy, a math major who wrote the initial 5x5x5 solver I used, comes into the picture. xyzzy taught me all kinds of things and one of the big ones was how to calculate how many entries will be in a lookup table.

Assume you have 12 marbles that are each a different color, how many ways can you arrange those 12 marbles? The answer is 12! or 479,001,600 which is a pretty big number given only 12 marbles. Now what if you had 4 red, 4 green and 4 blue marbles, how many different ways could you arrange those? The formula for that is 12!/(4! * 4! * 4!) or 34,650.

4x4x4 Centers

A 4x4x4 cube has 6 sides with 4 centers per side so my lookup table was going to have 24!/(4! * 4! * 4! * 4! * 4! * 4!) or 3,246,670,537,109,999 entries!!! That is one great big number :( I knew how many entries I was adding to my table per hour so I did some calculations to figure out how long it was going to take...3700 years :( That is more than 4x longer than Yoda lived, that's a really long time. I promptly walked over to my computer and hit ctrl-C on my builder.

I needed to break down solving 4x4x4 centers into some smaller problems whose lookup tables are much much smaller. I tried a few different approaches and settled on:

  • stage the UD centers to either sides U or D. This table will have 24!/(8! * 16!) or 735,741 entries, that is much more manageable. To build the "stage UD centers" table we take a solved cube and
    • change all of the D centers to a U
    • change all of the LFRB centers to an "x". For me x means I don't care about that square. 
    • walk through increasingly longer sequences of moves until we have added all 735,741 possible states 
  • stage the LR centers to either sides L or R.  This it turn stages the FB centers to sides F or B. While staging the LR centers you must make sure to not not unstage the UD centers...more on this later. The "stage LR centers" table will have 16!/(8! * 8!) or 12,870 entries. 

Building the "stage LR centers" is a very similar process to what we did to build the "stage UD centers" table with the added restriction that we cannot undo the UD centers that we just staged. When building the lookup table we must avoid all moves that destroy the UD centers, those moves are Rw, Rw', Lw, Lw', Fw, Fw', Bw, and Bw, all other moves are fair game. The w in those moves means "wide", this is the notation used to describe turning more than one layer of a cube. So Rw means turn the two layers on side R one quarter turn clockwise. Later when we get to 6x6x6 cubes you'll see moves like 3Rw which means turn the three layers on side R one quarter turn clockwise.

We have now "staged" all centers, to solve them our lookup table will be (8!/(4! * 4!))^3 or 343,000. We must further restrict the moves that are allowed when building the table to solve the staged centers because we do not want to undo the "stage LR centers" work that we just did.  We add moves Uw, Uw', Dw, and Dw' to the list of moves we are no longer allowed to use. Adding more moves to the list of restricted moves is how a new lookup table avoids undoing the work done by previous lookup tables.

I should mention now that all of the code used to build these lookup tables is available at https://github.com/dwalton76/rubiks-cube-lookup-tables .  It can take quite a while to build them all though so I checked all of the tables in to the https://github.com/dwalton76/rubiks-cube-NxNxN-solver repository to save users the hassle of building them.

4x4x4 Centers Example

We start with the following scrambled 4x4x4 cube



We do not care about the corners or edges for now to lets x those out to make things a little easier to visualize


The first thing we want to do is put the UD centers on sides U or D. We take the current state of the centers and find the line in lookup-table-4x4x4-step01-UD-centers-stage.txt for the centers state (we used binary search) that tells us what moves to use to go from our current state to having all UD squares on sides U or D.  After performing those moves we have



UD centers are staged to sides U or D, now we need to do the same for the LR centers and FB centers.  When we stage the LR centers it will in turn stage the FB centers.  We lookup the current state of our centers in our lookup-table-4x4x4-step02-LR-centers-stage.txt table, perform the moves listed and now we have

We have now staged all of the centers so now we can lookup the current state of our centers in lookup-table-4x4x4-step03-ULFRBD-centers-solve.txt and perform those moves to solve the centers:



4x4x4 Pairing edges

Pairing edges is tricky, I have yet to find a good way to build a lookup table for doing so that doesn't destroy the centers we just solved. The way I pair the edges is to try to pair three on a Uw "slice forward", move those out of the way and pair three more on a Uw' "slice back". When we slice back we restore the centers. Hopefully we do this sequence twice to pair all 12 edges but it doesn't always work out that way. A good rule of thumb is that once you have solved the centers it takes 2 1/2 moves to pair an edge.

So in a nutshell lookup tables are not used for pairing edges, the solver pairs edges much like a human would do.

4x4x4 Edges Example

Continuing with our example, our cube now looks like



Once we have paired all of the edges we have


4x4x4 Reduced to 3x3x3

Now our 4x4x4 looks like this


It can been reduced to the following 3x3x3


We use the kociemba 3x3x3 solver to compute the solution for the 3x3x3 above which gives us the remaining moves needed to solve the 4x4x4.


This cube took 59 moves to solve

  • 17 steps to solve centers
  • 25 steps to group edges
  • 17 steps to solve 3x3x3


5x5x5 Centers

We want to use the same basic strategy for solving the centers of a 5x5x5 so let's calculate how large the lookup table will be for staging the UD centers. A 5x5x5 has 9 center squares per side but the one in the center doesn't move so we can ignore that one. The remaining 8 can be divided into two groups of four:

  • "T-centers" are the ones that are directly above, below, left and right of the center squares. They form a T shape thus the name
  • "X-centers" are the remaining four squares, they form a X shape thus the name

We divide them into groups because a T-center square and a X-center square can never trade places. That means that there will be (24!/(8! * 16!))^2 or ~540 billion entries in a table to stage all UD squares to sides U or D, that would be a huge table and would take a long time to build.

What we can do though is build part of that table and then use this cool algorithm called IDA* (Iterative Deepening A*) to find a sequence of moves that will take our scrambled cube to a cube state that is in the partial lookup table for UD staging.  More on IDA* in a minute.

Once UD centers are staged we need to stage LR centers. The lookup table for staging LR centers (which also stages FB centers) will be (16!/(8! * 8!))^2 or 165,636,900 entries. Solving all of our staged centers will be (8!/(4! * 4!))^6 or 117,649,000,000 which is too large so instead we will

  • solve UD centers which is (8!/(4! * 4!))^2 or 4,900
  • solve LFRB centers which is (8!/(4! * 4!))^4 or 24,010,000

Woohoo we have solved the centers of a 5x5x5!!

IDA*

So how does IDA* work? We started building the lookup table for staging the 5x5x5 UD centers but stopped long before we built all 540 billion entries, we only built the table 6 moves deep which resulted in 17 million entries.  Our goal is to find a sequence of moves that will take the current cube to a state that is one of the 17 million states in that lookup table. How can we do that? Well one way would be pure brute force, just start cranking through longer and longer sequences of moves until we happen to find a sequence that puts the cube in one of those 17 million states.  That would work but would be really really slow.

We need a way to explore longer and longer sequences of moves but without exploring ALL of them,. We need a way to prune off branches of moves that are taking the cube away from those 17 million states instead of towards them. IDA* gives us a way to do this.  With IDA initially you say "I am willing to explore branches of moves where the estimated cost to solve UD centers is 1 move". If you don't find a solution then you increase your threshold to 2 moves, then 3 moves, etc, etc until your threshold is high enough that you can find a solution.  That is the "Iterative" part of IDA.

Let's talk about the "estimated cost to solve UD centers" component since this is what we are comparing against this increasing threshold.  This estimated cost consist of two things

  • The number of moves we have made so far, basically this is how many levels deep we are on the current branch of moves.  This is easy to compute.
  • The estimated number of moves to reach the solved state, in this case the solved state is "UD centers on sides U or D".  This is less easy to compute....how the heck do we estimate how many moves it will take us to reach our goal?

What if we knew how many steps it would take to solve just the UD T-centers? What if we knew how many steps it would take to solve just the UD X-centers? Each of those lookup tables only has 735,471 entries so those are easy to build.  So for any cube state we can lookup just the UD T-centers and know how many moves it will take to solve them and the same for X-centers.  Whichever would take the most moves is our "estimate" for the number of moves to go from the current state to our goal state of UD centers on sides U or D.  The UD T-centers and UD X-centers tables are known as "prune tables" because they allow us to prune off branches of moves that are taking us in the wrong direction.

That is IDA* in a nutshell...hopefully that made a little bit of sense.

5x5x5 Edge Pairing

Just like with 4x4x4 edge pairing, I do not use lookup tables for 5x5x5 edge pairing, instead I pair edges the same way a human might do. I attempt to pair three edges on a 3Uw slice forward and three more on a 3Uw' slice back. The last two edges of a 5x5x5 can be a little ugly, luckily someone put together a cheat sheet for the various states of the last two edges


6x6x6 Centers

Staging the UD centers for a 6x6x6 has too many permutations for us to use IDA, there are (24!/(8! * 16!))^4 or 292,591,841,163,066,669,769,281 permutations. Our prune tables would be 24!/(8! * 16!) or 735,741 which divided by 292,591,841,163,066,669,769,281 is only 0.000000000000000002515. The prune tables need to be a much larger percentage of the total size of the table in order for IDA* to work in a reasonable amount of time.

Instead what we must do is reduce our 6x6x6 centers so they look like 5x5x5 centers so that we can use our 5x5x5 solver to stage our 6x6x6 centers. IDA and prune tables are used to reduce the 6x6x6 centers to look like 5x5x5 centers:



Where the following is the equivalent 5x5x5 cube, we use this cube with the 5x5x5 solver to find the moves to solve the centers of the 6x6x6.

 And now our 6x6x6 centers are solved


The practice of taking some subset of the current cube and producing a smaller cube from it so we can use the solver already written for the smaller cube is a strategy that is used a lot for the 6x6x6 and 7x7x7 solvers.

6x6x6 Edge Pairing

Very little new code is needed to pair the edges of a 6x6x6, we take the inner "orbit" of squares on each edge to create a 4x4x4 cube and use the 4x4x4 solver to generate the solution for pairing these edges.  We can then do the same basic thing for the outer "orbit" but this time we reduce to a 5x5x5 and use the 5x5x5 edge solver, we do this by treating each inner paired "orbit" as a single square.

7x7x7

Solving the centers and edges for a 7x7x7 is more of the same strategies discussed above. There is lots of IDA and lots of reducing parts of the cube down to some smaller cube to use the solver of the smaller cube.  There are many steps to solving the 7x7x7 and this blog post is already pretty long so I won't go in depth on 7x7x7.  If you are curious as to what the steps are take a look at the 7x7x7 code here.

NxNxN

In theory once you have the lookup tables needed to solve up to a 7x7x7 you do not need to build any more lookup tables to solve 8x8x8 and larger.  I have started working on making the solver NxNxN capable, meaning it can solve any size cube, but this work is not complete yet.

Final Stats

  • 2x2x2 average solution is 9 moves
  • 3x3x3 uses the kociemba solver, 20 moves on average
  • 4x4x4 average solution is 65 moves
  • 5x5x5 average solution is 119 moves
  • 6x6x6 average solution is 214 moves
  • 7x7x7 average solution is 304 moves

Conclusion

It ended up taking me about 5 months to write my solver. I had no idea it would take me that long when I started, I'm not sure I would have tackled it had I known what I was in for :)  I don't give up easily though so I kept chugging along and finally got it all working.

In the beginning I spent a lot of time learning what not to do, which I guess is normal when you are jumping into something you know nothing about. I also spent a lot of time experimenting with different lookup tables to solve various phases of the cube in different ways.  I wanted the lookup tables to be small enough to check into github, github restricts files to a max size of 100M and the repository has to be under 1G.  So a lot of work went into finding strategies that worked AND resulted in lookup tables that were under 100M. There are about 700M of gzipped lookup tables in the repository.

There is a thread at https://www.speedsolving.com/forum/threads/5x5x5-6x6x6-7x7x7-or-nxnxn-solvers.63592/ where I first posted looking for large cube solvers.  The thread shows my solver evolve over several months as I got more and more working.

I learned a ton working on this project, I had never heard of IDA* and it felt downright magical when I first got it going.  It was also interesting to see just how many people have worked on solving rubiks cubes over the years and all the cool strategies they have come up with to solve them in fewer and fewer moves.  This is especially true for 3x3x3 solvers, the algorithms for solving it have been getting better and better for several decades.








2 comments:

  1. This is really cool! Some very interesting mathematical theory in here. Initially I thought this was referring to a robotic apparatus that turned a physical cube to solve it, finally clicked when I saw all the ascii cubes. Rock on!

    ReplyDelete
    Replies
    1. Actually, it is both :) I did build a robot that physically turns a cube to solve it, that is what the video at the top is showing. The rest of the post is describing the solver that is used by the robot.

      Delete