A unified multilingual handwriting recognition system using multigrams sub-lexical units - Normandie Université Accéder directement au contenu
Article Dans Une Revue Pattern Recognition Letters Année : 2019

A unified multilingual handwriting recognition system using multigrams sub-lexical units

Yann Soullard
  • Fonction : Auteur
  • PersonId : 980634
Thierry Paquet

Résumé

We address the design of a unified multilingual system for handwriting recognition. Most of multilingual systems rests on specialized models that are trained on a single language and one of them is selected at test time. While some recognition systems are based on a unified optical model, dealing with a unified language model remains a major issue, as traditional language models are generally trained on corpora composed of large word lexicons per language. Here, we bring a solution by considering language models based on sub-lexical units, called multigrams. Dealing with multigrams strongly reduces the lexicon size and thus decreases the language model complexity. This makes possible the design of an end-to-end unified multilingual recognition system where both a single optical model and a single language model are trained on all the languages. We discuss the impact of the language unification on each model and show that our system reaches state-of-the-art methods performance with a strong reduction of the complexity.
Fichier principal
Vignette du fichier
S0167865518303271.pdf (682.69 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02075654 , version 1 (22-10-2021)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

Citer

Wassim Swaileh, Yann Soullard, Thierry Paquet. A unified multilingual handwriting recognition system using multigrams sub-lexical units. Pattern Recognition Letters, 2019, 121, pp.68-76. ⟨10.1016/j.patrec.2018.07.027⟩. ⟨hal-02075654⟩
51 Consultations
36 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More