Skip to Main content Skip to Navigation
Conference papers

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

Bruno Stuner Clément Chatelain 1 Thierry Paquet 1 
1 DocApp - LITIS - Equipe Apprentissage
LITIS - Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes
Abstract : Deep learning approaches now provide state-of-the-art performance in many computer vision tasks such as handwriting recognition. However, the huge number of parameters of these models require big annotated training datasets which are difficult to obtain. Training neural networks with unlabeled data is one of the key problems to achieve significant progress in deep learning. In this article, we explore a new semi-supervised training strategy to train long-short term memory (LSTM) recurrent neural networks for isolated handwritten words recognition. The idea of our self-training strategy relies on the iteration of training Bidirectional LSTM recurrent neural network (BLSTM) using both labeled and unlabeled data. At each iteration the current trained network labels the unlabeled data and submit them to a very efficient "lexicon verification" rule. Verified unlabeled data are added to the labeled dataset at the end of each iteration. This verification stage has very low sensitivity to the lexicon size, and a full word coverage of the dataset is not necessary to make the semi-supervised method efficient. The strategy enables self-training with a single BLSTM and show promising results on the Rimes dataset.
Document type :
Conference papers
Complete list of metadata
Contributor : Thierry PAQUET Connect in order to contact the contributor
Submitted on : Thursday, March 21, 2019 - 3:47:09 PM
Last modification on : Wednesday, March 2, 2022 - 10:10:10 AM



Bruno Stuner, Clément Chatelain, Thierry Paquet. Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov 2017, Kyoto, Japan. pp.633-638, ⟨10.1109/ICDAR.2017.109⟩. ⟨hal-02075755⟩



Record views