Skip to Main content Skip to Navigation
Conference papers

Language identification from handwritten documents

Abstract : This paper presents a novel approach for language identification in handwritten documents. The approach is based on script identification followed by character recognition. BLSTM-CTC based handwriting recognizers are used and the OCR output is fed to a statistical language identifier for detecting the language of the input handwritten document. Documents in two scripts (Latin and Bengali) and four languages (English, French, Bengali and Assamese) are considered for evaluation. Several alternative frameworks have been explored, effects of handwriting recognition and text length on language detection have been studied. It is observed that with some empirical restrictions it is very much possible to achieve more that 80% language detection accuracy and based on the current research practical systems can be designed.
Document type :
Conference papers
Complete list of metadatas

https://hal-normandie-univ.archives-ouvertes.fr/hal-02087612
Contributor : Thierry Paquet <>
Submitted on : Tuesday, April 2, 2019 - 11:31:08 AM
Last modification on : Monday, July 22, 2019 - 2:41:53 PM

Identifiers

Citation

Luc Mioulet, Utpal Garain, Clément Chatelain, Philippine Barlas, Thierry Paquet. Language identification from handwritten documents. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Aug 2015, Tunis, Tunisia. pp.676-680, ⟨10.1109/ICDAR.2015.7333847⟩. ⟨hal-02087612⟩

Share

Metrics

Record views

60