Sudoku Dataset

SUDOKU dataset 160 pictures of Sudoku ready for ML

Created by Baptiste Wicht during his PHD about the use of Deep Learning technologies to automatically extract features from images, the Sudoku dataset contains 160 pictures of Sudoku taken in various newspapers using smartphone cameras.

160 pictures of Sudoku, divided into two sets: 120 training images and 40 test images

More than just images, the dataset also contains metadata ! Each image has a corresponding .dat file containing:

the brand and model of the phone used to take the picture
the textual representation of the sudoku grid

For example, to the following image:

is associated to the following .dat file:

sonyEricsson w810i
1632x1224:24 JPG
0 4 2 0 0 0 0 0 5 
0 0 0 6 3 2 0 8 0 
0 8 0 0 4 0 2 0 0 
0 0 0 0 0 0 0 0 0 
7 1 5 0 6 8 3 4 0 
9 0 8 3 5 0 7 6 1 
0 9 1 0 0 6 0 0 0 
0 0 0 0 2 0 1 9 0 
0 0 6 1 0 0 0 5 0

Downloads and resources

You can find the dataset directly on github or download it via the following links:

Some results have already been obtained on this dataset. To find out more:

Camera-based Sudoku recognition with deep belief network: Baptiste Wicht / Jean Hennebert (EIA-FR, Switzerland), Hough Transform and DBN : 12.5% error rate
Sudoku Recognition with Deep Belief Network: Blog post of Baptiste Wicht about the dataset and some results.