Detection and recognition of artificial text in arabic news videos

Detection and recognition of artificial text in arabic news videos PhD Thesis – Oussama Zayene

Realization

ENISo
Oussama Zayene
Prof. Najoua Essoukri Ben Amara
HEIA-FR
Prof. Jean Hennebert
UNIFR
Prof. Rolf Ingold

Keywords

AcTiV dataset
Arabic Video Text Detection
SWT
Auto-Encoders
Arabic Video Text Recognition
MDRNN
CTC layer
OCR
MDRNN-LSTM

Our skills

Video Optical Character Recognition, Stroke Width Transform algorithm, Convolutional Auto-Encoders, Support Vector Machines, Multidimensional Recurrent Neural Networks, Connectionist Temporal Classification.

Valorisation

More than 10 international peer reviewed publications

Partners

Our Partners:

National Engineering School of Sousse
University of Fribourg

Funding

LATIS, National Engineering School of Sousse, iCoSys, DIVA group UNIFR

TV news are important sources of information for most people. They allow a better understanding of the social and political events punctuating our everyday life. Today, we can save big amounts of digital news videos thanks to the availability of low-cost mass storage technology. As video archives are growing rapidly, making manual video annotation impractical, the need for efficient indexing and retrieval systems is evident. Text displayed in news video is one of the most important high-level information of video content. Actually, it can be used as powerful semantic clues for automatic broadcast annotation. Nevertheless, extracting text from videos is a non-trivial task due to many challenges like the complexity of backgrounds and the variability of text regions in scale, font, color and position. Over the past two decades, interest in this area of research has led to a plethora of text detection and recognition methods. So far, these methods have focused only on few languages such as Latin and Chinese. For a language like Arabic, which is used by more than one billion people around the world, the literature is limited to very few studies.

This thesis aims to contribute to the current research in the field of Video Optical Character Recognition (OCR) by developing novel approaches that automatically detect and recognize embedded Arabic text in news videos.

We introduce a two-stage method for Arabic text detection in video frames. In the first stage, which represents the CC-based detection part of this method, text candidates are firstly extracted, then filtered and grouped by respectively applying the Stroke Width Transform (SWT) algorithm, a set of heuristic rules and a proposed textline formation technique. In the second stage, which represents the machine-learning verification part, we make use of Convolutional Auto-Encoders (CAE) and Support Vector Machines (SVM) for text/non-text classification.

For text recognition, we adopt a segmentation-free methodology using multidimensional Recurrent Neural Networks (MDRNN) coupled with a Connectionist Temporal Classification (CTC) decoding layer. This system includes also a new preprocessing step and a compact representation of character models. We aim in this thesis to stand out from the dominant methodology that relies on hand-crafted features by using different deep learning methods, i.e. CAE and MDRNNs to automatically produce features.

Initially, there has been no publicly available dataset for artificially embedded text in Arabic news videos. Therefore, creating one is unquestionable. The proposed dataset, namely AcTiV, contains 189 video clips recorded from a DBS system to serve as a raw material for creating 4,063 text frames for detection tasks and 10,415 cropped text-line images for recognition purposes. AcTiV is freely available for the scientific community. It is worth noting that the dataset was used as a benchmark for two international competitions in conjunction with the ICPR 2016 and ICDAR 2017 conferences, respectively.

Publications

O. Zayene, S. M. Touj, J. Hennebert, R. Ingold, and N. E. B. Amara, “Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video,” IET Computer Vision, vol. 12, iss. 25, p. 710–719, 2018.
[Bibtex]

@article{ietzayene2018,
title={Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video},
author={Zayene, Oussama and Touj, Sameh Masmoudi and Hennebert, Jean and Ingold, Rolf and Amara, Najoua Essoukri Ben},
journal={IET Computer Vision},
volume={12},
number={25},
pages={710--719},
year={2018},
publisher={IET}
}

O. Zayene, S. M. Touj, J. Hennebert, R. Ingold, and N. E. B. Amara, “Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames,” Journal of Imaging, vol. 4, iss. 2, p. 32, 2018.
[Bibtex]

@article{jimagingzayene2018,
title={Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames},
author={Zayene, Oussama and Touj, Sameh Masmoudi and Hennebert, Jean and Ingold, Rolf and Amara, Najoua Essoukri Ben},
journal={Journal of Imaging},
volume={4},
number={2},
pages={32},
year={2018},
publisher={Multidisciplinary Digital Publishing Institute}
}

O. Zayene, J. Hennebert, R. Ingold, and N. E. B. Amara, “ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames,” in 14th International Conference on Document Analysis and Recognition (ICDAR), 2017, p. 1460–1465.
[Bibtex]

@inproceedings{zayene2017icdar,
title={ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames},
author={Zayene, Oussama and Hennebert, Jean and Ingold, Rolf and Amara, Najoua Essoukri Ben},
booktitle={14th International Conference on Document Analysis and Recognition (ICDAR)},
pages={1460--1465},
year={2017},
organization={IEEE}
}

O. Zayene, N. Hajjej, S. M. Touj, S. Ben Mansour, J. Hennebert, R. Ingold, and N. E. B. Amara, “ICPR2016 Contest on Arabic Text Detection and Recognition in Video Frames ̶AcTiVComp,” in 23th International Conference on Pattern Recognition (ICPR), 2016, p. 187–191.
[Bibtex]

@inproceedings{oussama2016icpr,
title={ICPR2016 Contest on Arabic Text Detection and Recognition in Video Frames ̶AcTiVComp},
author={Zayene, Oussama and Hajjej, Nadia and Touj, Sameh Masmoudi and Ben Mansour, Soumaya and Hennebert, Jean and Ingold, Rolf and Amara, Najoua Essoukri Ben},
booktitle={23th International Conference on Pattern Recognition (ICPR)},
pages={187--191},
year={2016},
organization={IEEE}
}

O. Zayene, M. Seuret, S. M. Touj, J. Hennebert, R. Ingold, and N. E. B. Amara, “Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders,” in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), 2016, pp. 13-18.
[Bibtex]

@inproceedings{oussama2016das,
author={O. {Zayene} and M. {Seuret} and S. M. {Touj} and J. {Hennebert} and R. {Ingold} and N. E. B. {Amara}},
booktitle={2016 12th IAPR Workshop on Document Analysis Systems (DAS)},
title={Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders},
year={2016},
volume={},
number={},
pages={13-18},
keywords={image coding;image filtering;natural language processing;text detection;transforms;unsupervised learning;video signal processing;visual databases;text specificities;antialiasing artifacts;horizontally aligned artificial text detection;Arabic news video;stroke width transform algorithm;SWT algorithm;convolutional autoencoder;text candidate components;geometric constraints;stroke width information;CAE;unsupervised feature learning method;textline candidates;Arabic-text-in-video database;AcTiV-DB;evaluation protocols;TV channels;compression artifacts;Feature extraction;Computer aided engineering;Image edge detection;Learning systems;Training;Filtering algorithms;Support vector machines;Arabic text detection;SWT operator;CAE;AcTiV-DB},
doi={10.1109/DAS.2016.80},
ISSN={},
month={April}
}

O. Zayene, S. M. Touj, J. Hennebert, R. Ingold, and N. E. Ben Amara, “Data, protocol and algorithms for performance evaluation of text detection in Arabic news video,” in 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2016, pp. 258-263.
[Bibtex]

@inproceedings{oussama2016atsip,
author={O. {Zayene} and S. M. {Touj} and J. {Hennebert} and R. {Ingold} and N. E. {Ben Amara}},
booktitle={2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)},
title={Data, protocol and algorithms for performance evaluation of text detection in Arabic news video},
year={2016},
volume={},
number={},
pages={258-263},
abstract={Benchmark datasets and their corresponding evaluation protocols are commonly used by the computer vision community, in a variety of application domains, to assess the performance of existing systems. Even though text detection and recognition in video has seen much progress in recent years, relatively little work has been done to propose standardized annotations and evaluation protocols especially for Arabic Video-OCR systems. In this paper, we present a framework for evaluating text detection in videos. Additionally, dataset, ground-truth annotations and evaluation protocols, are provided for Arabic text detection. Moreover, two published text detection algorithms are tested on a part of the AcTiV database and evaluated using a set of the proposed evaluation protocols.},
keywords={computer vision;natural language processing;optical character recognition;performance evaluation;text detection;video signal processing;performance evaluation;text detection;Arabic news video;computer vision;Arabic video-OCR system;Protocols;Databases;Image edge detection;Optical character recognition software;Detection algorithms;Detectors;XML;text detection;Evaluation Protocol;AcTiV database;Arabic Video-OCR},
doi={10.1109/ATSIP.2016.7523079},
ISSN={},
month={March}
}

O. Zayene, J. Hennebert, S. M. Touj, R. Ingold, and N. E. B. Amara, “A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV,” in 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, p. 996–1000.
[Bibtex]

@inproceedings{zayene2015:icdar,
title={A dataset for Arabic text detection, tracking and recognition in news videos-AcTiV},
author={Zayene, Oussama and Hennebert, Jean and Touj, Sameh Masmoudi and Ingold, Rolf and Amara, Najoua Essoukri Ben},
booktitle={13th International Conference on Document Analysis and Recognition (ICDAR)},
pages={996--1000},
year={2015},
organization={IEEE},
pdf={http://www.hennebert.org/download/publications/icdar-2015-a-dataset-for-arabic-text-detection-tracking-and-recognition-in-news-videos-activ.pdf},
}

O. Zayene, S. M. Touj, J. Hennebert, R. Ingold, and N. E. B. Amara, “Semi-automatic news video annotation framework for Arabic text,” in 4th International Conference on Image Processing Theory, Tools and Applications (IPTA), 2014, p. 1–6.
[Bibtex]

@inproceedings{zayene2014:ipta,
title={Semi-automatic news video annotation framework for Arabic text},
author={Zayene, Oussama and Touj, Sameh Masmoudi and Hennebert, Jean and Ingold, Rolf and Amara, Najoua Essoukri Ben},
booktitle={4th International Conference on Image Processing Theory, Tools and Applications (IPTA)},
pages={1--6},
year={2014},
organization={IEEE},
pdf={http://www.hennebert.org/download/publications/ipta-2014-semi-automatic-news-video-annotation-framework-for-arabic-text.pdf},
}

F. Slimane, O. Zayene, S. Kanoun, A. M. Alimi, J. Hennebert, and R. Ingold, “New Features for Complex Arabic Fonts in Cascading Recognition System,” in Proc. of 21th International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2012, pp. 738-741.
[Bibtex]

@conference{fouad2012:icpr,
author = "Fouad Slimane and Oussema Zayene and Slim Kanoun and Adel M. Alimi and Jean Hennebert and Rolf Ingold",
abstract = "We propose in this work an approach for automatic recognition of printed Arabic text in open vocabulary mode and ultra low resolution (72 dpi). This system is based on Hidden Markov Models using the HTK toolkit. The novelty of our work is in the analysis of three complex fonts presenting strong ligatures: DiwaniLetter, DecoTypeNaskh and DecoTypeThuluth. We propose a feature extraction based on statistical and structural primitives allowing a robust description of the different morphological variability of the considered fonts. The system is benchmarked on the Arabic Printed Text Image (APTI) database.",
address = "Tsukuba, Japan",
booktitle = "Proc. of 21th International Conference on Pattern Recognition (ICPR 2012)",
isbn = "978-1-4673-2216-4",
issn = "1051-4651",
keywords = "Character and Text Recognition, Handwriting Recognition, Performance Evaluation, Machine Learning",
month = "November",
note = "Some of the files below are copyrighted. They are provided for your convenience, yet you may download them only if you are entitled to do so by your arrangements with the various publishers.",
pages = "738-741",
publisher = "IEEE",
title = "{N}ew {F}eatures for {C}omplex {A}rabic {F}onts in {C}ascading {R}ecognition {S}ystem",
Pdf = "http://www.hennebert.org/download/publications/icpr-2012-new-features-for-complex-arabic-fonts-in-cascading-recognition-system.pdf",
year = "2012",
}