Deep Learning Feature Extraction for Image Processing

PhD Thesis – Baptiste Wicht: Deep Learning Features for Image Processing Analysis of the potential of features automatically learned by Deep Learning models for Image processing tasks.
Realization
  • HES-SO Fribourg
  • Baptiste Wicht
  • Prof. Jean Hennebert
Keywords
  • Machine learning
  • Deep Learning
  • Image Processing
  • Keyword Spotting
Our skills
Machine learning, Deep learning, RBM, Convolutional RBM
Valorisation
6 international peer reviewed publications
Partners

Our Partners:

  • DIVA – UNIFR
Funding
CTI/KTI
HES-SO

UniFr CTI

Typically, hand-crafted features are extracted from images for further processing tasks. These features are then passed to a Machine Learning algorithm to learn specific models. These features are generally difficult to design and are poorly adaptable from one data set to the other. This thesis investigates the use of machine learning approaches to generate these features instead of relying on handcrafted algorithms. This thesis attempts to answer two questions: are such learned features outperforming regular handcrafted features and what are the prerequisites and difficulties to perform the learning. More specifically, this thesis investigates the use of Restricted Boltzmann Machines (RBMs) and Convolutional RBMs (CRBMs) to learn features from images. Two tasks have been defined to develop the systems. To support these experiments, a complete machine learning framework has been implemented on top of an optimized matrix computation backend.

Sudoku Recognition. This first experiment used a Deep Belief Network (DBN) that was trained in a unsupervised manner to recognize Sudoku images taken from Swiss newspapers. The goal was to study the impact of unsupervised pretraining using RBM with, among other things, an analysis of the capability to learn features on mixed printed and handwritten inputs.

Handwritten Keyword Spotting. A complete model for Keyword Spotting is designed. This model uses feature extracted using a CRBM training in a purely unsupervised manner. These features are then passed to a DTW algorithm or to a HMM in order to perform keyword spotting. The learned features have demonstrated to outperform significantly the handcrafted state-of-the-art features in most configurations.

Publications

  • [PDF] [DOI] N. R. Howe, A. Fischer, and B. Wicht, “Inkball Models as Features for Handwriting Recognition,” in 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2016, pp. 96-101.
    [Bibtex]
    @INPROCEEDINGS{2016howeicfhr,
    author={N. R. Howe and A. Fischer and B. Wicht},
    booktitle={2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR)},
    title={Inkball Models as Features for Handwriting Recognition},
    year={2016},
    pages={96-101},
    abstract={Inkball models provide a tool for matching and comparison of spatially structured markings such as handwritten characters and words. Hidden Markov models offer a framework for decoding a stream of text in terms of the most likely sequence of causal states. Prior work with HMM has relied on observation of features that are correlated with underlying characters, without modeling them directly. This paper proposes to use the results of inkball-based character matching as a feature set input directly to the HMM. Experiments indicate that this technique outperforms other tested methods at handwritten word recognition on a common benchmark when applied without normalization or text deslanting.},
    keywords={Computational modeling;Handwriting recognition;Hidden Markov models;Mathematical model;Prototypes;Skeleton;Two dimensional displays;Handwriting recognition;Hidden Markov models;Image processing;Pattern recognition},
    doi={10.1109/ICFHR.2016.0030},
    ISSN={2167-6445},
    month={Oct},}
  • B. Wicht, A. Fischer, and J. Hennebert, “Deep Learning Features for Handwritten Keyword Spotting,” in 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 3423-3428.
    [Bibtex]
    @conference{wicht:icpr2016,
    author = "Baptiste Wicht and Andreas Fischer and Jean Hennebert",
    abstract = "Deep learning had a significant impact on diverse pattern recognition tasks in the recent past. In this paper, we investigate its potential for keyword spotting in handwritten documents by designing a novel feature extraction system based on Convolutional Deep Belief Networks. Sliding window features are learned from word images in an unsupervised manner. The proposed features are evaluated both for template-based word spotting with Dynamic Time Warping and for learning-based word spotting with Hidden Markov Models. In an experimental evaluation on three benchmark data sets with historical and modern handwriting, it is shown that the proposed learned features outperform three standard sets of handcrafted features.",
    booktitle = "23rd International Conference on Pattern Recognition (ICPR)",
    editor = "IEEE",
    keywords = "Handwriting Recognition, Deep learning, Artificial neural networks, keyword spotting",
    month = "December",
    note = "Some of the files below are copyrighted. They are provided for your convenience, yet you may download them only if you are entitled to do so by your arrangements with the various publishers.",
    pages = "3423-3428",
    title = "{D}eep {L}earning {F}eatures for {H}andwritten {K}eyword {S}potting",
    url = "http://www.hennebert.org/download/publications/icpr-2016-deep-learning-features-for-handwritten-keyword-spotting.pdf",
    year = "2016",
    }
  • [DOI] B. Wicht, A. Fischer, and J. Hennebert, “On CPU Performance Optimization of Restricted Boltzmann Machine and Convolutional RBM,” in Artificial Neural Networks in Pattern Recognition: 7th IAPR TC3 Workshop, ANNPR 2016, Ulm, Germany, September 28–30, 2016, Proceedings, F. Schwenker, H. M. Abbas, N. El Gayar, and E. Trentin, Eds., Cham: Springer International Publishing, 2016, p. 163–174.
    [Bibtex]
    @inbook{wicht:2016annpr,
    author = "Baptiste Wicht and Andreas Fischer and Jean Hennebert",
    address = "Cham",
    booktitle = "Artificial Neural Networks in Pattern Recognition: 7th IAPR TC3 Workshop, ANNPR 2016, Ulm, Germany, September 28--30, 2016, Proceedings",
    doi = "10.1007/978-3-319-46182-3_14",
    editor = "Schwenker, Friedhelm
    and Abbas, M. Hazem
    and El Gayar, Neamat
    and Trentin, Edmondo",
    isbn = "978-3-319-46182-3",
    pages = "163--174",
    publisher = "Springer International Publishing",
    title = "{O}n {CPU} {P}erformance {O}ptimization of {R}estricted {B}oltzmann {M}achine and {C}onvolutional {RBM}",
    url = "http://dx.doi.org/10.1007/978-3-319-46182-3_14",
    year = "2016",
    }
  • [DOI] B. Wicht, A. Fischer, and J. Hennebert, “Keyword Spotting with Convolutional Deep Belief Networks and Dynamic Time Warping,” in Artificial Neural Networks and Machine Learning – ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II, A. E. P. Villa, P. Masulli, and A. J. Pons Rivero, Eds., Cham: Springer International Publishing, 2016, p. 113–120.
    [Bibtex]
    @Inbook{wicht:2016icann,
    author="Wicht, Baptiste
    and Fischer, Andreas
    and Hennebert, Jean",
    editor="Villa, Alessandro E.P.
    and Masulli, Paolo
    and Pons Rivero, Antonio Javier",
    title="Keyword Spotting with Convolutional Deep Belief Networks and Dynamic Time Warping",
    bookTitle="Artificial Neural Networks and Machine Learning -- ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II",
    year="2016",
    publisher="Springer International Publishing",
    address="Cham",
    pages="113--120",
    isbn="978-3-319-44781-0",
    doi="10.1007/978-3-319-44781-0_14",
    url="http://dx.doi.org/10.1007/978-3-319-44781-0_14"
    }
  • [PDF] [DOI] B. Wicht and J. Henneberty, “Mixed handwritten and printed digit recognition in Sudoku with Convolutional Deep Belief Network,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 861-865.
    [Bibtex]
    @INPROCEEDINGS{wicht:icdar2015,
    author={B. Wicht and J. Henneberty},
    booktitle={2015 13th International Conference on Document Analysis and Recognition (ICDAR)},
    title={Mixed handwritten and printed digit recognition in Sudoku with Convolutional Deep Belief Network},
    year={2015},
    pages={861-865},
    abstract={In this paper, we propose a method to recognize Sudoku puzzles containing both handwritten and printed digits from images taken with a mobile camera. The grid and the digits are detected using various image processing techniques including Hough Transform and Contour Detection. A Convolutional Deep Belief Network is then used to extract high-level features from raw pixels. The features are finally classified using a Support Vector Machine. One of the scientific question addressed here is about the capability of the Deep Belief Network to learn extracting features on mixed inputs, printed and handwritten. The system is thoroughly tested on a set of 200 Sudoku images captured with smartphone cameras under varying conditions, e.g. distortion and shadows. The system shows promising results with 92% of the cells correctly classified. When cell detection errors are not taken into account, the cell recognition accuracy increases to 97.7%. Interestingly, the Deep Belief Network is able to handle the complex conditions often present on images taken with phone cameras and the complexity of mixed printed and handwritten digits.},
    keywords={Hough transforms;belief networks;handwriting recognition;image sensors;mobile computing;support vector machines;Hough Transform;Sudoku;Sudoku images;Sudoku puzzles;contour detection;convolutional deep belief network;handwritten digits;image processing techniques;mixed handwritten recognition;printed digit recognition;printed digits;smartphone cameras;support vector machine;Camera-based OCR;Convolution;Convolutional Deep Belief Network;Text Detection;Text Recognition},
    doi={10.1109/ICDAR.2015.7333884},
    month={Aug},
    Pdf = {http://www.hennebert.org/download/publications/icdar-2015-mixed-handwritten-and-printed-digit-recognition-in-sudoku-with-convolutional-deep-belief-network.pdf},
    }
  • [PDF] [DOI] B. Wicht and J. Hennebert, “Camera-based Sudoku Recognition with Deep Belief Network,” in 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2014), 2014, pp. 83-88.
    [Bibtex]
    @conference{wicht2014:socpar,
    Abstract = {In this paper, we propose a method to detect and recognize a Sudoku puzzle on images taken from a mobile camera. The lines of the grid are detected with a Hough transform. The grid is then recomposed from the lines. The digits position are extracted from the grid and finally, each character is recognized using a Deep Belief Network (DBN). To test our implementation, we collected and made public a dataset of Sudoku images coming from cell phones. Our method proved successful on our dataset, achieving 87.5% of correct detection on the testing set. Only 0.37% of the cells were incorrectly guessed. The algorithm is capable of handling some alterations of the images, often present on phone-based images, such as distortion, perspective, shadows, illumination gradients or scaling. On average, our solution is able to produce a result from a Sudoku in less than 100ms.},
    Author = {Baptiste Wicht and Jean Hennebert},
    Booktitle = {2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR 2014)},
    Doi = {10.1109/SOCPAR.2014.7007986},
    Isbn = {9781479959358},
    Keywords = {Machine Learning, DBN, Deep Belief Network, Image Recognition, Text Detection, Text Recognition},
    Pages = {83-88},
    Publisher = {Institute of Electrical and Electronics Engineers ( IEEE )},
    Title = {{C}amera-based {S}udoku {R}ecognition with {D}eep {B}elief {N}etwork},
    Pdf = {http://www.hennebert.org/download/publications/socpar-2014-camera-based-sudoku-recognition-with-deep-belief-network.pdf},
    Year = {2014},
    Pdf = {http://www.hennebert.org/download/publications/socpar-2014-camera-based-sudoku-recognition-with-deep-belief-network.pdf},
    Bdsk-Url-2 = {http://dx.doi.org/10.1109/SOCPAR.2014.7007986}}