Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

Bruno Stuner; Clément Chatelain; Thierry Paquet

doi:10.1109/ICDAR.2017.109

Communication Dans Un Congrès Année : 2017

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

, (1) , (1)

Bruno Stuner

Fonction : Auteur

Clément Chatelain

Fonction : Auteur
PersonId : 178
IdHAL : clement-chatelain
ORCID : 0000-0001-8377-0630
IdRef : 131504835

Equipe Apprentissage

Thierry Paquet

Fonction : Auteur
PersonId : 16353
IdHAL : thierry-paquet
ORCID : 0000-0002-2044-7542
IdRef : 068943229

Equipe Apprentissage

Résumé

Deep learning approaches now provide state-of-the-art performance in many computer vision tasks such as handwriting recognition. However, the huge number of parameters of these models require big annotated training datasets which are difficult to obtain. Training neural networks with unlabeled data is one of the key problems to achieve significant progress in deep learning. In this article, we explore a new semi-supervised training strategy to train long-short term memory (LSTM) recurrent neural networks for isolated handwritten words recognition. The idea of our self-training strategy relies on the iteration of training Bidirectional LSTM recurrent neural network (BLSTM) using both labeled and unlabeled data. At each iteration the current trained network labels the unlabeled data and submit them to a very efficient "lexicon verification" rule. Verified unlabeled data are added to the labeled dataset at the end of each iteration. This verification stage has very low sensitivity to the lexicon size, and a full word coverage of the dataset is not necessary to make the semi-supervised method efficient. The strategy enables self-training with a single BLSTM and show promising results on the Rimes dataset.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Thierry PAQUET : Connectez-vous pour contacter le contributeur

https://normandie-univ.hal.science/hal-02075755

Soumis le : jeudi 21 mars 2019-15:47:09

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Dates et versions

hal-02075755 , version 1 (21-03-2019)

Identifiants

HAL Id : hal-02075755 , version 1
DOI : 10.1109/ICDAR.2017.109

Citer

Bruno Stuner, Clément Chatelain, Thierry Paquet. Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Nov 2017, Kyoto, Japan. pp.633-638, ⟨10.1109/ICDAR.2017.109⟩. ⟨hal-02075755⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSA-ROUEN LITIS COMUE-NORMANDIE UNIROUEN UNILEHAVRE INSA-GROUPE

32 Consultations

0 Téléchargements

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager