Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition - Normandie Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

Bruno Stuner
  • Fonction : Auteur
Clément Chatelain
Thierry Paquet

Résumé

Deep learning approaches now provide state-of-the-art performance in many computer vision tasks such as handwriting recognition. However, the huge number of parameters of these models require big annotated training datasets which are difficult to obtain. Training neural networks with unlabeled data is one of the key problems to achieve significant progress in deep learning. In this article, we explore a new semi-supervised training strategy to train long-short term memory (LSTM) recurrent neural networks for isolated handwritten words recognition. The idea of our self-training strategy relies on the iteration of training Bidirectional LSTM recurrent neural network (BLSTM) using both labeled and unlabeled data. At each iteration the current trained network labels the unlabeled data and submit them to a very efficient "lexicon verification" rule. Verified unlabeled data are added to the labeled dataset at the end of each iteration. This verification stage has very low sensitivity to the lexicon size, and a full word coverage of the dataset is not necessary to make the semi-supervised method efficient. The strategy enables self-training with a single BLSTM and show promising results on the Rimes dataset.
Fichier non déposé

Dates et versions

hal-02075755 , version 1 (21-03-2019)

Identifiants

Citer

Bruno Stuner, Clément Chatelain, Thierry Paquet. Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov 2017, Kyoto, Japan. pp.633-638, ⟨10.1109/ICDAR.2017.109⟩. ⟨hal-02075755⟩
32 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More