Skip to Main content Skip to Navigation
Conference papers

Unconstrained Bengali handwriting recognition with recurrent models

Abstract : This paper presents a pioneering attempt for developing a recurrent neural net based connectionist system for unconstrained Bengali offline handwriting recognition. The major challenge in configuring such a classification system for a complex script like Bengali is to effectively define the character classes. A novel way of defining character classes is introduced making the recognition problem suitable for using a recurrent model. Indeed, it has to deal with more than nine hundred character classes for which the occurrence probability is very skewed in the language. An off-the-shelf BLSTM-CTC recognizer is used. An open-source dataset is developed for unconstrained Bengali offline handwriting recognition. The dataset contains 2,338 handwritten text lines consisting of about 21,000 word. Experiment shows that with the new definition of character classes the BLSTM-CTC provides an impressive performance for unconstrained Bengali offline handwriting recognition. The character level recognition accuracy is 75.40% without doing any post-processing on the BLSTM-CTC output. Among the 24.60% character level errors, the substitution, deletion and insertion errors are 18.91%, 4.69% and 0.98%, respectively.
Document type :
Conference papers
Complete list of metadatas

https://hal-normandie-univ.archives-ouvertes.fr/hal-02087603
Contributor : Thierry Paquet <>
Submitted on : Tuesday, April 2, 2019 - 11:28:31 AM
Last modification on : Friday, July 19, 2019 - 2:38:26 PM

Identifiers

Citation

Utpal Garain, Luc Mioulet, B. Chaudhuri, Clément Chatelain, Thierry Paquet. Unconstrained Bengali handwriting recognition with recurrent models. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Tunis, Tunisia. pp.1056-1060, ⟨10.1109/ICDAR.2015.7333923⟩. ⟨hal-02087603⟩

Share

Metrics

Record views

72