DBNLP: detecting bias in natural language processing system for India-centric languages

dc.contributor.authorKeerthan Kumar, K.K.
dc.contributor.authorMendke, S.
dc.contributor.authorParihar, R.
dc.contributor.authorMayya, S.
dc.contributor.authorVenkatesh, S.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-03T13:19:47Z
dc.date.issued2025
dc.description.abstractNatural language processing (NLP) is gaining widespread interest and seeing advancements rapidly due to its attractive and exhilarating applications. NLP models are being developed in search engines for real-world scenarios such as language translation, sentiment analysis, chat-bots such as ChatGPT, and auto-completion. These models are trained on a vast corpus of online data, exposing them to harmful biases and stereotypes towards various communities. The models learn these biases, making harmful and undesirable predictions about particular genders, religions, races, and professions. Biases in NLP systems can perpetuate societal biases and discrimination, leading to unfair and unequal treatment of individuals or groups. It is crucial to identify these biases, which will help mitigate them. Most of the literary works in this area have been primarily Western-centric, focusing on the English language, making it tough to use them for Indian models and languages. In this work, we propose a model called Detecting Bias in Natural Language Processing System for India-Centric Languages (DBNLP), which aims to identify the biases relevant to the Indian context present in the text-based language models, particularly for the English and Hindi languages. The DBNLP presents three techniques for bias identification based on (1) a Context Association Test (CAT), (2) a template-based perturbation technique for various co-domain associations, and (3) a co-occurrence count-based corpus analysis technique. Further, this work showcases how India-centric models such as IndicBERT, MuRIL, and datasets such as IndicCorp are biased toward various demographic categories. Detecting bias in natural language processing systems for India-centric languages is essential to creating fair, diverse, and inclusive models that benefit society. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2025.
dc.identifier.citationInternational Journal of Information Technology (Singapore), 2025, 17, 6, pp. 3291-3306
dc.identifier.issn25112104
dc.identifier.urihttps://doi.org/10.1007/s41870-025-02437-9
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/20217
dc.publisherSpringer Science and Business Media B.V.
dc.subjectDiverse and Inclusive models
dc.subjectHarmful biases and stereotypes
dc.subjectIndia-centric models
dc.subjectIndian context
dc.subjectNatural language processing
dc.titleDBNLP: detecting bias in natural language processing system for India-centric languages

Files

Collections