Unconstrained Bengali handwriting recognition with recurrent models - Archive ouverte HAL Access content directly
Conference Papers Year :

Unconstrained Bengali handwriting recognition with recurrent models

Abstract

This paper presents a pioneering attempt for developing a recurrent neural net based connectionist system for unconstrained Bengali offline handwriting recognition. The major challenge in configuring such a classification system for a complex script like Bengali is to effectively define the character classes. A novel way of defining character classes is introduced making the recognition problem suitable for using a recurrent model. Indeed, it has to deal with more than nine hundred character classes for which the occurrence probability is very skewed in the language. An off-the-shelf BLSTM-CTC recognizer is used. An open-source dataset is developed for unconstrained Bengali offline handwriting recognition. The dataset contains 2,338 handwritten text lines consisting of about 21,000 word. Experiment shows that with the new definition of character classes the BLSTM-CTC provides an impressive performance for unconstrained Bengali offline handwriting recognition. The character level recognition accuracy is 75.40% without doing any post-processing on the BLSTM-CTC output. Among the 24.60% character level errors, the substitution, deletion and insertion errors are 18.91%, 4.69% and 0.98%, respectively.
Not file

Dates and versions

hal-02087603 , version 1 (02-04-2019)

Identifiers

Cite

Utpal Garain, Luc Mioulet, B. Chaudhuri, Clément Chatelain, Thierry Paquet. Unconstrained Bengali handwriting recognition with recurrent models. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Tunis, Tunisia. pp.1056-1060, ⟨10.1109/ICDAR.2015.7333923⟩. ⟨hal-02087603⟩
20 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More