Faculty Publications

Now showing 1 - 3 of 3

Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023)
(Institute of Electrical and Electronics Engineers Inc., 2023) Kang, J.; Patel, M.M.; Agrawal, A.; Simhadri, S.; Srinivasa, R.; Bellato, S.; Anand Kumar, M.; Tsang, N.D.; El-Haj, M.
In this comprehensive paper, we present a detailed overview of the Financial Table Of Content extraction shared task series, FinTOC, conducted over a span of five years from 2019 to 2023. This paper serves as a retrospective analysis of the key developments in the field of financial document structure extraction. The FinTOC series, hosted within the framework of the Financial Narrative Processing (FNP) workshop, has been instrumental in shaping the landscape of Natural Language Processing (NLP) in the financial domain. Our analysis delves into the diverse methodologies proposed by participants across all editions, shedding light on the innovative strategies employed to tackle the intricate challenge of extracting structured information from financial documents. We explore the evolution of techniques, from traditional rule-based approaches to cutting-edge deep learning models, showcasing the dynamic nature of NLP advancements. Furthermore, our study investigates the introduction of multilingual datasets by the organizers, highlighting the importance of cross-lingual analysis in financial document processing. We also examine the contributions made by participants in augmenting the training data with external sources, showcasing the collaborative spirit of the NLP community in enhancing the quality and size of the shared training dataset. Â© 2023 IEEE.
HALE Lab NITK at TouchÃ© 2024: A Hybrid Approach for Identifying Political Ideology and Power in Multilingual Parliamentary Speeches
(CEUR-WS, 2024) Simhadri, S.; Patel, M.M.; Sowmya Kamath, S.
In this article, an approach to determine the political views and stances of speakers for identifying whether they support or oppose the government in parliamentary discussions is presented. The work was carried out as part of the TouchÃ© 2024 Task 2, â€œIdeology and Power Identification in Parliamentary Debatesâ€ . Towards this, two systems were developed, the first employs traditional machine learning methods with TF-IDF embeddings, while the second utilizes advanced NLP techniques with the LASER encoder for multilingual embeddings. Both systems incorporate standard preprocessing techniques and also integrates a variety of models, after which a voting classifier is used to combine the predictions from both approaches. Experiments revealed that this comprehensive framework effectively addresses the complexities and nuances of political discourse, providing valuable insights into speakers' ideologies and governing statuses within parliamentary debates. Â© 2024 Copyright for this paper by its authors.
AI-Powered Cryptanalysis: Identifying Encryption Algorithms and Recovering Plaintext
(Institute of Electrical and Electronics Engineers Inc., 2025) Simhadri, S.; Raghavendra; Purushothama, B.R.
With encryption becoming more prevalent for the security of digital correspondence, the actual process of analyzing the ciphertext without the decryption key becomes one of the single biggest problems in cybersecurity and cryptanalysis. This represents two fundamental problems: classifying ciphertext based on the encryption scheme used, and reconstructing plaintext from encrypted sequences leveraging deep learning. The more classic style approaches to cryptanalysis often rely on brute force or some mathematical 'weakness' in the algorithm itself, but with the advent of neural networks, the cryptanalysts are able to discover patterns to the structural data represented in the encrypted data. This paper deploys the bidirectional long short-term memory (BiLSTM) and bidirectional gated recurrent unit (BiGRU) neural networks to classify ciphertext produced by the Advanced Encryption Standard (AES), Triple Data Encryption Standard (3DES), Blowfish, and Twofish encryption schemes into the respective categories. The BiLSTM model was able to classify the ciphertext with a 87.91% classification accuracy for the dataset, with the 1.07 % better performance over the BiGRU model, which successfully classified the dataset with 86.98% accuracy. The second part of the research involved the use of a sequence-to-sequence long short-term model to reconstruct original text from ciphertexts encrypted under the Data Encryption Standard (DES) and Twofish - plaintext was provided from the Internet Movie Database (IMDB) dataset. The reconstruction accuracy of DES-encrypted ciphertext was high, achieving an F1-score of 0.868, which supports that certain encryption schemes may retain exploitable patterns on which deep learning models can be trained. In contrasting examples, the Twofish-decrypted ciphertext was lowered to an F1-score of 0.750 resulting in a lower F1 by 13.6% due to heavier diffusion which produced additional resistance. The above findings demonstrate the efficacy of neural models to detect and exploit structural weaknesses in legacy encryption systems and call for encryption algorithms to reduce recoverable features against deep learning attacks. The study provides the first step for future studies involving artificial intelligence driven tools assisting in forensic cryptography, automated vulnerability assessment, and secure system design. Â© 2025 IEEE.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results