Language identification from handwritten documents - Normandie Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Language identification from handwritten documents

Résumé

This paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed.
Fichier non déposé

Dates et versions

hal-02087612 , version 1 (02-04-2019)

Identifiants

Citer

Luc Mioulet, Utpal Garain, Clément Chatelain, Philippine Barlas, Thierry Paquet. Language identification from handwritten documents. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Tunis, Tunisia. pp.676-680, ⟨10.1109/ICDAR.2015.7333847⟩. ⟨hal-02087612⟩
44 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More