Is there a statistician in the house?

I downloaded the Visual Studio 2008 Express Edition ISO from the Microsoft site and discovered that specific stuff I was looking for isn’t supported in that special edition. I guess there is (or rather was) a huge difference between ‘Orcas’ and the Express Editions (Earlier at xsamplex).

That said, there was a time for looking through code and then I discovered that at Microsoft’s WPF site somebody had uploaded a Peer-2-Peer version of Sudoku (see also this post, or rather, this photo). This reminded me that a couple of weeks ago, I was asked to help with an algorithm that generates a complete filled out Sudoku puzzle. This I did and after that I moved on. So, that Peer-2-Peer version apparently has an algorithm that does the same, so when I opened the particular source file, I was shocked to find the following code (see my tag):

for (int iRow = 1; iRow < = 9; iRow++)
{
    for (int iCol = 1; iCol <= 9; iCol++)
    {
        iNum = rnd.Next(1, 10);
        while ((IsInColumn(iNum, iCol) 
                  | IsInRow(iNum, iRow) | 
                   IsInThreeByThree(
                      iNum, 
                      GetThreeByThree(
                         GetThreeByThreeIndexFromRowCol(
                             iRow, iCol)))) 
                  == true)
        {
            // Generate a number to be put on the grid..
            iNum = rnd.Next(1, 10);
            iAttempts++;

            // AH01 -- OH NOES...
            if (iAttempts == 100)
            {
                bStuck = true;
                break;
            }
            // AH01 -- Ends.
        }
        if (bStuck)
            break;
        iAttempts = 0;
        arrPuzzle[iRow - 1, iCol - 1] = iNum;
        if (iRow == 9 && iCol == 9)
            bDone = true;
    }
    if (bStuck)
        break;
}

Let me go back a couple of steps. In Sudoku, a 9 by 9 grid is divided in nine 3×3 grids: the numbers in each 3×3 grid MUST be unique and between 1 and 9. Additionally the same rule applies to each horizontal and vertical line in the grid. A fairly good description can be found at Wikipedia.

The main algorithm for filling out a 3×3 grid with 9 unique numbers, is a statistical routine that should have came up during math classes as the ‘coloured balls in a basket problem’. There’s 10, no, lets say 9 different coloured balls in a basket. What are the chances of (blindly) picking out the red ball in the second turn?

Back to that source snippet: Random generating a number from 1 to 9 The author randomly draws a number between 1 and 9, checks if the number has already been drawn. If so, he keeps generating a random number until he gets a number that fits the needs. Here’s where the problem starts: the more numbers you have picked, the less chances there are going to be that your random generator is going to pick one of the numbers that has not been drawn yet. Take a look at the example on the right (an animated gif, opens in separate window): If the stars are aligned right and your computer is in a bad mood, it may just take a while to randomly generate those last numbers. Certainly, you could throw in a 8-core computer to speed up the random number generator but the problem will still be there. The programmer knew this too hence the reason for that ‘jailbreak condition’ (“After 100 attempts I think I should break it off otherwise it looks like the program has crashed”).

Randomly drawing numbers from 1 to 9The right way to do this is, by actually drawing these numbers from either a list or an array as shown in the animated GIF (click on the image on the left). This effectively means that your randomize Next routine needs to adjust dynamically after each draw: after all, every time you draw a number, the amount of available numbers decreases by one. There’s a minor catch here, but that’s up for you to figure out.

So: what’s the point of bringing this up? The Sudoku Peer-2-Peer application is a great showcase of the latest and greatest .Net technologies but fails horribly on the basics: using math to solve a complex problem. I do (for example) have more respect for this guy who threw in a genetic algorithm to solve the problem. This is overkill for sure because it can take up to 10 minutes to fill out a complete grid. At least he understands (and showcases) how genetic algorithms work.

Update (01/21/08): Related: Who killed the Software Engineer?