Skip to Main content Skip to Navigation
Conference papers

Handwriting Recognition with Multigrams

Wassim Swaileh 1 Thierry Paquet 1 Yann Soullard 1 Pierrick Tranouez 1
1 DocApp - LITIS - Equipe Apprentissage
LITIS - Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes
Abstract : We introduce a novel handwriting recognition approach based on sub-lexical units known as multigrams of characters, that are variable lengths characters sequences. A Hidden Semi Markov model is used to model the multigrams occurrences within the target language corpus. Decoding the training language corpus with this model provides an optimized multigram lexicon of reduced size with high coverage rate of OOV compared to the traditional word modeling approach. The handwriting recognition system is composed of two components: the optical model and the statistical n-grams of multigrams language model. The two models are combined together during the recognition process using a decoding technique based on Weighted Finite State Transducers (WFST). We experiment the approach on two Latin language datasets (the French RIMES and English IAM datasets) and we show that it outperforms words and character models language models for high Out Of Vocabulary (OOV) words rates, and that it performs similarly to these traditional models for low OOV rates, with the advantage of a reduced complexity.
Complete list of metadatas
Contributor : Thierry Paquet <>
Submitted on : Thursday, March 21, 2019 - 3:45:45 PM
Last modification on : Tuesday, April 9, 2019 - 4:46:45 PM



Wassim Swaileh, Thierry Paquet, Yann Soullard, Pierrick Tranouez. Handwriting Recognition with Multigrams. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov 2017, Kyoto, Japan. pp.137-142, ⟨10.1109/ICDAR.2017.31⟩. ⟨hal-02075753⟩



Record views