LeDoFAN: enhancing lengthy document fake news identification leveraging large language models and explainable window-based transformers with n-gram expulsion

LekshmiAmmal, H.R.; Anand Kumar, M.

LeDoFAN: enhancing lengthy document fake news identification leveraging large language models and explainable window-based transformers with n-gram expulsion

Date

2025

Authors

LekshmiAmmal, H.R.

Anand Kumar, M.

Publisher

Springer Science and Business Media Deutschland GmbH

Abstract

Nowadays, people use social media to gather everything around them and consider it their primary source of information. Moreover, people rely more on information disseminated through social media and news channels. The alarming concern is that as the amount of information increases, the amount of fake news or misinformation spread also increases through social media. Generally, fake news has few lines of data; when it comes to a document or an article, the amount of information or the size of the documents is high, and it needs to be appropriately trained to build a model. In this work, we have developed a model that identifies and classifies fake news, consisting of articles collected from social media websites and news pages trained using transformer-based architecture. We have introduced a novel window method for handling lengthy documents and an N-gram expulsion method for managing similar words for classifying the article as fake or real news. We achieved the state-of-the-art F1-score of 0.3492 on test data for the window-based N-gram expulsion method and got an F1-score improvement of 2.1% for long documents alone with this method. We also explored the large language models (LLMs), specifically TinyLlama, which could only achieve an F1-score of 0.2098, and with LLama for summarization of the document that achieved an F1-score of 0.3402 with N-gram expulsion. We have further explored the results using Explainable Artificial Intelligence (XAI) to know the reason behind the proposed model’s intuition. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.

Keywords

BERT, Explainable AI, F1 scores, Fake news, Language model, Misinformation, N-grams, Social media, Transformer, Window-based

Citation

International Journal of Machine Learning and Cybernetics, 2025, 16, 9, pp. 6577-6596

URI

https://doi.org/10.1007/s13042-025-02635-8
https://idr.nitk.ac.in/handle/123456789/20116

Collections

Journal Articles

Full item page

LeDoFAN: enhancing lengthy document fake news identification leveraging large language models and explainable window-based transformers with n-gram expulsion

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By