Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023)

dc.contributor.authorKang, J.
dc.contributor.authorPatel, M.M.
dc.contributor.authorAgrawal, A.
dc.contributor.authorSimhadri, S.
dc.contributor.authorSrinivasa, R.
dc.contributor.authorBellato, S.
dc.contributor.authorAnand Kumar, M.
dc.contributor.authorTsang, N.D.
dc.contributor.authorEl-Haj, M.
dc.date.accessioned2026-02-06T06:34:36Z
dc.date.issued2023
dc.description.abstractIn this comprehensive paper, we present a detailed overview of the Financial Table Of Content extraction shared task series, FinTOC, conducted over a span of five years from 2019 to 2023. This paper serves as a retrospective analysis of the key developments in the field of financial document structure extraction. The FinTOC series, hosted within the framework of the Financial Narrative Processing (FNP) workshop, has been instrumental in shaping the landscape of Natural Language Processing (NLP) in the financial domain. Our analysis delves into the diverse methodologies proposed by participants across all editions, shedding light on the innovative strategies employed to tackle the intricate challenge of extracting structured information from financial documents. We explore the evolution of techniques, from traditional rule-based approaches to cutting-edge deep learning models, showcasing the dynamic nature of NLP advancements. Furthermore, our study investigates the introduction of multilingual datasets by the organizers, highlighting the importance of cross-lingual analysis in financial document processing. We also examine the contributions made by participants in augmenting the training data with external sources, showcasing the collaborative spirit of the NLP community in enhancing the quality and size of the shared training dataset. © 2023 IEEE.
dc.identifier.citationProceedings - 2023 IEEE International Conference on Big Data, BigData 2023, 2023, Vol., , p. 2839-2844
dc.identifier.urihttps://doi.org/10.1109/BigData59044.2023.10386125
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29330
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectdocument layout analysis
dc.subjectfinancial data processing
dc.subjectmachine learning
dc.subjectPDF document processing
dc.titleAdvancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023)

Files