Skip to main content

Sudoku in Excel: Intersection methods




Excel is well known as an environment in which you can create models. Many modellers secure the integrity of their model by a combination of protecting and/or hiding sensitive cells. This is understandable when the model becomes the live or final version. Nevertheless, I have railed against using Excel in this way for 3 decades but have failed to convince colleagues.

Using Excel to model solution methods in Sudoku is intended to demonstrate why it is such an excellent environment for exposing the strengths (and weaknesses) of any particular solution method. I concentrate here on two very similar methods referred to as “Intersection”.

Background

I have published a number of blogs / articles covering the more straightforward elements of users interacting with their Sudoku puzzles. This article builds on the previous publications.

Title Source Date
Spatial Modelling Techniques in Microsoft Excel EuSpRIG 14th Annual Conference, Greenwich July 2013
Spreadsheet Control Strategies and Tactics Blogger: Learning after a stroke October 2016
Sudoku: A Game of Control and Error Management Blogger: Learning after a stroke May 2019
Sudoku: Options for layout and analysis in a Spreadsheet Blogger: Learning after a stroke May 2019
Sudoku: To automate ... or not to automate Blogger: Learning after a stroke June 2019
Sudoku: Solutions without User Input Blogger: Learning after a stroke June 2019
Sudoku: Naked Multiples in Excel Blogger: Learning after a stroke June 2019

Sudoku Intersection Method and Terminology

Unfortunately there is no universally acknowledged language for describing the technique. I have also seen it referred to as “omission”, “pointing”, “blocking” and “claiming”. In reality, my approach to the Excel model construction covers a family of intimately related techniques and I have called these “Intersection Methods”.

The Intersection Methods are not new. The model uses Excel to illuminate why they work in some circumstances and not in others. Below is a list of websites, bloggers etc. who offer more detailed insight of the Intersection Methods.

Source and Hyperlink Brief description Essential terminology
Mirrored from Sudopedia A part of the grid where 2 houses share one or more cells. Mostly used for intersections between a box and a row or column. In standard Sudoku, there are 27 box-row intersections and 27 box-column intersections, each containing 3 cells. Grid, houses, cells, box, row, column.
SudoduWiki.org If any one number occurs twice or three times in just one unit (any row, column or box) then we can remove that number from the intersection of another unit. Intersection removal, row, column, box, pointing pairs, pointing triples
Solving Sudoku (hints) Sometimes a candidate within a box is restricted to one row or column. Since one of these cells must contain that specific candidate, the candidate can safely be excluded from the remaining cells in that row or column outside of the box. Locked candidates, box, row, column
SudokuXP (paulspages) When a candidate number only appears in one row or column of a box, the box 'claims' that number within the entire row or column. Claim, box, row , column, candidate number
Learn Sudoku When pencil marks in a row or column are contained inside a single block, pencil marks elsewhere in the block can be removed. Omission, block, pencil marks, row, column.
Thonky.com There are two types of intersection removal: pointing pairs and box/line reduction. Intersection removal, pointing pairs and triples.
Sudoku Snake If the only instances of a candidate within a row or column are in the same box, you can eliminate candidates of that value from any other cell in the box. Claiming, candidate, row, column, box


I find the description from the SudokuWiki.org called ‘intersection removal’ the easiest to understand. Also, there is an implicit assumption in all these descriptions, that the candidate or Id has not been solved in either of the dimensions under investigation. Although, unstated it is crucial to the logic of the analysis.

It is clear from this list that we have no common language or terminology for describing this process.  I use the following terms. They match those used in the previous articles.
  • Grid – the whole puzzle comprising 81 cells
  • Cell – one of the 81 intersections between the 9 rows and 9 columns of the grid
  • Row – one of the 9 rows comprising 9 cells
  • Column – one of the 9 columns comprising 9 cells
  • Sub-Grid – one of the 9 groups of 9 cells comprising 3 rows and 3 columns such that they fit the grid without gaps or overlapping cells 
  • House - a generic name for one of the individual dimensions Row, Column,or Sub-Grid
  • Id – one of the number identities 1- 9
We can restate the Intersection Method in our defined terminology as “If a Column or Row intersects with a Sub-Grid, there will be three cells common to both Houses. If any single unsolved Id occurs twice of three times within these common cells and is absent from the 6 cells of a House (which are not common), then the Sudoku rules permit the removal of the Id from the equivalent 6 cells of the opposite House.”

Modelling the Intersection Methods in Excel

The descriptions of the ‘intersections methods’ in the previous section demonstrates how difficult it is to describe these methods succinctly, such that a modeller can create a series of formula based analyses within Excel that accurately mimics the success or failure of a particular case. Most of the descriptions employ specific examples. There are 4 general cases – see SudoduWiki.org – but is reasonable to separate out the 2 cases associated with rows from those associated with columns for the purposes of creating an Excel model.

The logic of the case statement depends initially on both (a) identifying the 3 cells that are common to the sub-grid and its other dimension (row or column) and (b) the Id under consideration must be available to at least two of these common cells.

In order to mimic these rules in excel we must identify the 3 common cells. The user decides which Id he wishes to consider and the Row (or Column) and associated Sub-Grid that is to be analysed.

Figure 1 - User Input

The relevant range is selected from the whole puzzle. In the context of the analysis above, we extract Rows 7 – 9, so that all the rows associated with Sub-Grid 8 are included. Clearly, this means we are extracting 27 cells but not all are concerned directly with the analysis. The relevant cells in the picture below are outlined in red.

Figure 2 - Detailed Extract from Sudoku puzzle
Figure 3 - Sudoku extract in columnar format
This spatial arrangement is very inconvenient for analysis in Excel. So we rearrange the SuDoku cells into a simpler format where each row represents a single cell and the 9 columns show the individual Ids that are available as solutions to the cell. The reformatted grid or range is called the ‘RowGridTbl’ and forms the starting point for the ‘Intersection Method’. 

The first 9 cells of the example are presented on the left and represent the first Row of the SuDoku extract above. The initial analysis identifies which of the cells contain the Id under review.

While the extract of the puzzle contains 27 cells, but only 15 of these are pertinent to the intersection method. This is managed through a series of lookup tables (in this example ‘RowColLUpTbl’ and ‘GridIndLUpTbl’). Each lookup table identifies where a member of the SuDoku House will appear within the columnar version of the SuDoku extract. The diagram to the left shows rows 4 – 6, 13 – 15 and 19 – 27 all outlined in red. They represent all the cells that we are interested in. They cover both the Row and the Sub-Grid. The cells outline in green are common to both Houses.

The visual analysis in the diagram above is repeated in terms that the spreadsheet can work with. The table below (highlighted in yellow) shows where the cells in Row 9 of the Sudoku puzzle will appear in the rearranged version of the puzzle. Similarly, the table also shows where the cells of the 8th Sub-Grid will appear (highlighted in green). It is clear that three of the cells from both the row and the sub-grid are common to both Houses (encircled in red).

Figure 4 - Cell selection from the Row and Sub-Grid

The next phase determines whether either House (excluding the cells common to both House) exhibits an absence of the Id in these 6 cells. The following region demonstrates this approach.

Figure 5 - Identifying the non-common cells from the Row and Sub-Grid
The region surrounded by a green rectangle represents the portion of the sub-grid that have no cells in common with the row and are labelled “V”. The column headed “G” indicates that none of these cells can be solved by Id 6. The equivalent region, for the rows, is surrounded by two yellow rectangles. There are 2 cells within this region that could, potentially, be solved by Id 6. They are marked with the value “Y” in column “R”. However, turning back to our rule for intersection method, we can assume this Id 6 must lie somewhere within the cells that are common between the 2 Houses. Accordingly, the potential for the Id 6 being as possible solution for the non-common cells can be eliminated. In summary, this is demonstrated by the totals for Eval Row = 4 and Eval Grid = 6.

Having determined that that some of the potential solutions in respect of Id 6 can be eliminated, we must pin point which those cells are, delete the Id 6 for those cells only and leave everything else, within the portion of the SuDoku puzzle that we are considering, unchanged. The diagram below, showing the dependencies of a single cell, illuminates the potential complexity of the problem.

Figure 6 - Sub-Grid dependencies diagram

The table to the right cell of the dependencies diagram has a similar set of dependencies, but has only 2 cells marked “D” for deletion. In theory all 6 of these non-common cells could have had Id 6 as an available solution. In practice though this is very rare.

The formula for the dependent cells in the diagram above controls this analysis and is worth considering in some detail. 

'=IF(AND(ISNUMBER(MyRef),INDEX(DelRow,MyRef,1)=0,ValidGCell=6,MyRef=IdVal),"D","")

The formula precedents provide the information needed to understand the methodology.

Figure 7 - Tracing the precedents of the deletion control table


The AND function requires that four formula elements are TRUE. These are

General Function Detail for Specific Cell Explanation
ISNUMBER(MyRef) ISNUMBER($CP45) Determines whether this cell forms part of the Row under investigation. Cell $CP45 will only have a numeric value if it is part of the row.
INDEX(DelRow,MyRef,1)=0 INDEX(DelRow,$CV45,1)=0 The cell address $CV45 determines the row number of the DelRow range. An INDEX value of zero in this range confirms that this cell does not form one of the Row cells common to the Sub-Grid.
ValidGCell=6l ValidGCell=6 Confirms that the equivalent Sub-Grid dimension has no Id’s of this value outside the common cells.
MyRef=IdVal Y45=IdVal Confirms that the cell under investigation contains an optional result of the Id that is being reviewed (IdVal).

If, and only if, all these elements are confirmed as TRUE, will the Id for that cell be marked for deletion. You will note from the cell dependencies diagram above, that only 2 or the 6 possible cells have a marking of “D” denoting that Id 6 should be removed as an optional solution.

Clearly, if the Id is absent from the 6 non-common Sub-Grid cells (thus permitting it removal from the equivalent Row sells – see table above), then the mirror image logic can also be true. If the Id is absent from the 6 non-common Rows cells is can be removed from the equivalent Sub-Grid cells. This spreadsheet allows for logical statements associated with the 6 non-common cells of either the Sub-grid or the Row to be true. It tests for both conditions, before eliminating the Id from the appropriate cells.

Worksheet Templates and their Structure

The Excel community normally regards Template as Workbook with predefined worksheet structures and formulae. A template may contain multiple connected worksheet. This approach cuts across the philosophy used by ACBA Electronic Working Papers (ACBA-EWP). ACBA-EWP projects (or workbooks) have indexed worksheets (called working papers). The indexing is based on a flexible structure that is defined by the user, on the fly, as he creates the project. Some of working papers may be based on predefined worksheet templates. 

Figure 8 - Managing the source values of Rows (or Columns) and Sub-Grids

The figure above shows the first 27 of the 81 positions of a Sudoku puzzle. They show also where the equivalent Sub-Grids will appear in respect of their associated Rows and Columns. 

Notice that the left hand group for Rows shows a non-repeating pattern in the column headed EqNo (equivalent number); the first of which is 1 2 3 10 11 12 19 20 21. The pattern for all 27 cells repeats twice more for the remaining 54 cells.

The equivalent right hand group for Columns shows a simpler repeating pattern of 1 thru 9 for all 27 cells. The patterns repeat for the values 10 – 18 and 19 – 27 for the remaining 2 sets of 27 cells.

Together, these tables allow us to match the Sub-Grids to their intersecting rows or columns, within a columnar format using formulae, as opposed to visually. 

Reinstating the Sudoku puzzle layout

Once the analysis is completed, we need to reverse the columnar layout created above. The reversal process for the Rows extract is shown below.

Figure 9 - Returning columnar table to their original Sudoku position for Rows

And this compares with the Columns extract.

Figure 10 - Returning columnar table to their original Sudoku position for Colmns

The tables of numbers below the Rows extract and to the right of the Columns extract, simply tell the columnar validation analysis where to return its results in relation to the traditional Sudoku puzzle.


Error Management in Excel

Sudoku is a game and making mistakes or misjudgments is part of the game. My philosophy in constructing the spreadsheet was not to prevent misjudgments, but, rather, to allow the user an escape route to correct himself in the event of an error. Even so some mistakes are so blatant that they should be avoided.

When calling an Intersection Method template, the cells for the user choices have bright green backgrounds and are subject to user validation: Id limited to 1 – 9, Row or Column limited to 1 – 9 and, initially, Sub-Grid limited to 1 – 9. However once the user has chosen his Row or Column, the template employs conditional validation to limit the Sub-Grid to those appropriate to his initial choice.

The diagram below shows that the Sub-Grid choices are limited to 1, 2 or 3 once Row number three has been selected.

Figure 11 - Limiting the choice of Sub-Grid based on the Row (or Column) selected

Sadly, this conditionality within the Excel environment appears to be one directional only. I have not found a way of selecting the Sub-Grid initially, and limiting the Row, without resorting to 2 entirely separate models.

In our definition of the Intersection Method, we included a condition “If any single unsolved Id occurs twice of three times within these common cells…” One could argue that the model should test for this condition before allowing the analysis to proceed. However, there is no such validation 
The Id had been resolved in one or both dimensions of the 12 non-common cells.process within the model. There are three scenarios where the condition might fail.
  1. The Id had been resolved in one or both dimensions of the 12 non-common cells.
  2. The Id had been resolved in one of the 3 common cells.
  3. An error has been generated such that one or other of the dimensions has had all reference to the Id eliminated.
In cases 1 and 2, the model will proceed without error but will not generate any changes to the state of play. Case 3 is fatal but will not necessarily show up until the user attempts to complete the puzzle. However, the user has access to sufficient data about the state of the puzzle to avoid making any of these errors. I took the view that this form of validation was superfluous. 

Discussion

The Intersection Methods are regarded by the experts in Sudoku as relatively straight forward. Even so, the prerequisite conditions for their effective use are rare. It is difficult to find sufficient examples for each version of the method to test the models comprehensively. The detail outlined above, however, suggests the Row based and Column based methods will deliver valid results.
As discussed, both methods delete Ids from either the non-common Row (or Column) cells or the non-common Sub-Grid cells. In the following example, the deletion of Id 8 takes place from the Row.

Figure 12 - Combing the results of the Row (or Column) analysis

Similarly, in the Column based test, the deletion (in this case for Id 1) takes place from the Sub-Grid.

Figure 13 - Combing the results of the Sub-Grid analysis
Both, the spreadsheets evaluate deletions from either the Sub-Grid (the left hand group of columns) or its complementary long dimension (Row or Column) (the adjacent group to the right). The logic of the Sudoku rules mean that deletion can only take place in one dimension or the other; not both. We needed a third frame that drew the results from either analysis into a single place. This is them used to amend the original state of the puzzle extract.

In the examples, that we tested, it was common for valid results not to deliver a final solution for the cell in question. Indeed, this is the case in both the examples exhibited above. In fact these exhibits form part of the complete solution to a single puzzle – note the similarities in figures for the first 3 cells in the right hand set of columns for each table.

It is possible to delete the Ids in more than one cell of the dimension. Figure 6 above showing the dependencies also shows an example of this. In this case one of the cells generated a final result.

I recognize that being given images of only part of a worksheet makes it difficult to follow the logic of the whole. The solution of a complete puzzle can be downloaded from SuDoku_GUA25012020_Expert.xlsx. Look for the worksheets with the names Id 8 in Row 3 & SubGrid 2 and Id 1 in Col 1 and SubGrid 1 in the index on the first sheet. The hyperlinks in the right hand column will take you to the worked examples. As a whole this workbook takes the user through a series of analyses that leads to a solution for the puzzle.

Spreadsheet Model Development

The majority of sophisticated/complex spreadsheets are created within industry, commerce or educational environments. Mostly, they are created by teams of developers that allow for the organised validation of the model, its relationship to users and its structure. The Sudoku Intersection Method model has none of these advantages. It was created by me alone and exhibits some of the classic errors that are criticized by spreadsheet professionals. I argue that these errors do not necessarily invalidate the model but that the imperfections of the model need to be exposed in order to gain a full understanding of its weaknesses and, by implication, its associated strengths.

I treat the creation of a spreadsheet in much the same way has a student might treat a science / applied maths experiment. You have an objective, a method or pathway, a testing process and, hopefully, a useful conclusion. While I knew the objective and had a tentative method or pathway, I couldn’t be sure that it would deliver the expected result. This lack of absolute clarity shows up in some of the in some of the model’s structures.

Redundant structures are the most obvious sign of changes in the method employed. The elements shaded pink in the illustration below are no longer used in the current analytical method.

Figure 14 - Retained but redundant structures

In many circumstances these would be removed as extraneous to the direct logic of the model. I have retained because they remind me of the processes I investigated before arriving at a suitable structure.

Excel is designed so that, when a constructor moves a section of his model, the formulae that refer to that structure in cell address format should adjust itself accordingly. This works for the most part but can fail in certain circumstances when $ fixed formulae are involved. The section that deals with matching cells that are common to both the long dimension and the Sub-Grid has this problem.

Figures 15 - Presentational inconsistencies


The first formula cell in the second column (headed DRC) has this problem - '=IFERROR(MATCH(C35,$E$35:$E$43,0),0). When I wanted to add a second analysis (VRC) moving the column risked losing the connection to the fixed range. Hence the second analysis is separated by the RC column. The equivalent analysis for the Sub-Grid (GI) does not exhibit this messy structure because of the space available. In a commercial situation, a structure like this would not acceptable but could be reasonably quickly improved by employing a named range.

Curiously, the DRC and VRC columns exhibit almost identical formulae. Using an alpha character in one case as opposed to a numeric offered certain advantages. The alpha approach delivers results to one cell. The numeric approach delivers, essentially the same result, to 81 separate cells. 

Conclusion

The “Intersection Methods” as realized in Excel’s two dimensional format deliver the results required of the Sudoku logic. The game’s quasi 3 dimensional environment was a challenge handled by rearranging the games 3 dimensions into 2. The step by step presentation the logical pathway allows the player to identify why his input either succeeds or fails.

The presentational structures of the spreadsheets show some of the mistakes that I made when building them. These are mistakes associated with trying to over simplify the problem. These have been left to remind me of the dangers of over simplification.

The “Intersection Methods” complement and add to methods described previously.

Comments

Popular posts from this blog

Walberswick – June 2023

 Patrick and I chose the hottest week of the year so far to stay at Esme’s caravan in Walberswick. I spent the first few days on my own to check the workings of the caravan. It is old and showing signs of wear and tear, but the absence of curtains in large south facing window was a real irritation. Bryony saved the day by bringing our large picnic table cloth, which we were able to pin over the window for shade. That turned out to be vital given the weather. I claim that it was all the fault of COVID 19 that I am now over weight and unfit. I used these first few days to identify how unfit I was. The answer turned out to be “very”. The walking of the shingle beach from the camp site to the Blythe estuary was a nightmare. In previous years, I would get up early and walk for an hour or so before considering the activity for the day. This was based on a circular walk up the river path and over the heathland that surrounds Walberswick. Morning Walk This took rather longer than I had

ACBA Mapping: Employing ‘UsedRange’ in Excel

When reviewing a worksheet the ACBA Mapping software uses VBA’s ‘UsedRange’ function to identify the full scope of the sheet. This function is excellent for identifying separate ranges employed in a worksheet, but has some drawbacks. One of these drawbacks has been identified while reviewing the Enron corpus of files ( https://figshare.com/articles/dataset/Enron_Spreadsheets_and_Emails/1221767 ) The Enron corpus of spreadsheets and other correspondence is huge. Felienne Hermans however has whittled the number of workbooks down to 16,189 unique items.   https://www.felienne.com/archives/3634   The workbook that clearly displayed the UsedRange issue is named ‘benjamin_rogers__938__historical outages.xlsx’. Clearly the files have been renamed to prevent duplicates. In this blog though I will refer to it as ‘Historical Outages’. In the original Mapping software, users were advised when the ‘UsedRange’ exceeded half a million cells. This allows the user to escape from the process and t

What is a Cell Map?

It is a simplified, colour coded version of an Excel Worksheet. The map shows the structure of the worksheet, its input cells, constants, primary formula and 'named' ranges. The map is designed for spreadsheet developers who work in isolation (i.e. not part of a team), but who, nevertheless, build complex spreadsheet systems. It highlights oddities and inconsistencies within a spreadsheet structure. Excel has its own built-in auditing and mapping tools. Excel's name manager and formula auditing tools These are useful for auditing one formula or one name, but they become overburdened with detail when trying to audit/review a whole worksheet. The ACBA Mapping   [A revised version of the software is now available. Stephen Allen 05 September 2023] utility generates a map of the original spreadsheet and three map keys to aid the interpretation of the detail. For example, I went on strict weight reducing diet last year where I measured the weight/volume of everythin