Language identification from handwritten documents - Archive ouverte HAL Access content directly
Conference Papers Year :

Language identification from handwritten documents

Abstract

This paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed.
Not file

Dates and versions

hal-02087612 , version 1 (02-04-2019)

Identifiers

Cite

Luc Mioulet, Utpal Garain, Clément Chatelain, Philippine Barlas, Thierry Paquet. Language identification from handwritten documents. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Tunis, Tunisia. pp.676-680, ⟨10.1109/ICDAR.2015.7333847⟩. ⟨hal-02087612⟩
39 View
0 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More