Skip to Main content Skip to Navigation
Conference papers

Daniel@FinTOC-2021: Taking Advantage of Images and Vectorial Shapes in Native PDF Document Analysis

Emmanuel Giguet 1 Gaël Lejeune 2, 3 
1 Equipe SAFE - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
3 STIH-LC - Équipe Linguistique computationnelle
STIH - Sens, Texte, Informatique, Histoire
Abstract : In this paper, we present our contribution to the FinTOC-2021 Shared Task "Financial Document Structure Extraction". We participated in the tracks dedicated to English and French document processing. We get results for Title detection and TOC generation performance which demonstrates a good precision. We address the problem in a fairly unusual but ambitious way which consists in considering simultaneously text content, vectorial shapes and images embedded in the native PDF document, and to structure the document in its entirety.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03744586
Contributor : Giguet Emmanuel Connect in order to contact the contributor
Submitted on : Wednesday, August 3, 2022 - 9:56:00 AM
Last modification on : Saturday, August 6, 2022 - 3:50:15 AM

File

Fintoc-2021.fnp-1.13.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03744586, version 1

Citation

Emmanuel Giguet, Gaël Lejeune. Daniel@FinTOC-2021: Taking Advantage of Images and Vectorial Shapes in Native PDF Document Analysis. 3rd Financial Narrative Processing Workshop, Sep 2021, Lancaster, United Kingdom. pp.70-74. ⟨hal-03744586⟩

Share

Metrics

Record views

5

Files downloads

3