Writing Type and Language Identification in Heterogeneous and Complex Documents
Abstract
This paper presents a system dedicated to automatic recognition of both the writing type and the language of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text, in various languages (French, English and Arabic). To handle such a problem, we divided it into two sub-tasks: The writing type identification and the language identification. The method for the writing type recognition is based on the analysis of the connected components while the language identification approach combines the analysis of connected components and the analysis of character distributions. We present the results obtained by the system during the second competition round of the MAURDOR campaign, and show that the performance of our system compares favorably with other participants.