Human-in-the-Loop Data Analytics for Classifying Fatal Mining Accident Causes Using Natural Language Processing and Machine Learning Techniques

Sharma, A.; Kumar, A.; Vardhan, H.; Mangalpady, A.; Mandal, B.B.; Senapati, A.; Akhil, A.; Saini, S.

Human-in-the-Loop Data Analytics for Classifying Fatal Mining Accident Causes Using Natural Language Processing and Machine Learning Techniques

dc.contributor.author	Sharma, A.
dc.contributor.author	Kumar, A.
dc.contributor.author	Vardhan, H.
dc.contributor.author	Mangalpady, A.
dc.contributor.author	Mandal, B.B.
dc.contributor.author	Senapati, A.
dc.contributor.author	Akhil, A.
dc.contributor.author	Saini, S.
dc.date.accessioned	2026-02-03T13:19:04Z
dc.date.issued	2025
dc.description.abstract	Mining remains one of the most hazardous industries globally, marked by frequent fatalities resulting from complex operational risks. While accident investigation reports hold valuable insights for improving safety practices, the manual coding of fatality narratives remains labor-intensive, inconsistent, and impractical for large datasets. Although natural language processing (NLP) and machine learning (ML) techniques have gained traction for automating the analysis of safety narratives in other high-risk industries, their application to mining accident data, particularly within the Indian context, remains limited. Addressing this gap, the present study proposes a ML framework for the semi-automated classification of fatal accident causes from unstructured text narratives reported by the Directorate General of Mines Safety (DGMS) between 2016 and 2022. A total of 401 fatal accident descriptions were pre-processed and vectorized using Bag-of-Words, TF-IDF, and Word2Vec techniques, followed by model evaluation across multiple algorithms. A semi-automated classification scheme was developed to balance efficiency with expert oversight, where high-confidence predictions were assigned automatically and uncertain cases were flagged for manual review. Logistic regression combined with TF-IDF unigram features achieved the highest performance, with an F1 score of 0.78 and an accuracy of 0.81. Overall, the developed framework successfully auto-coded 68.75% of cases with 94% accuracy, 0.93 recall, and 0.91 precision. Word cloud visualizations were also employed to capture dominant words associated with different cause categories. The proposed framework offers a practical and operationally feasible solution for assigning fatality causes in the mining sector, contributing to active safety management, surveillance, and policy formulation. © Society for Mining, Metallurgy & Exploration Inc. 2025.
dc.identifier.citation	Mining, Metallurgy and Exploration, 2025, 42, 6, pp. 4155-4167
dc.identifier.issn	25243462
dc.identifier.uri	https://doi.org/10.1007/s42461-025-01351-9
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/19936
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.subject	Accident prevention
dc.subject	Accidents
dc.subject	Classification (of information)
dc.subject	Codes (symbols)
dc.subject	Large datasets
dc.subject	Learning algorithms
dc.subject	Learning systems
dc.subject	Logistic regression
dc.subject	Machine learning
dc.subject	Mine safety
dc.subject	Mining
dc.subject	Natural language processing systems
dc.subject	Occupational diseases
dc.subject	Occupational risks
dc.subject	Text processing
dc.subject	Accident narrative classification
dc.subject	Automated coding
dc.subject	HILDA
dc.subject	Language processing
dc.subject	Machine-learning
dc.subject	Mining accident
dc.subject	Natural language processing
dc.subject	Natural languages
dc.subject	Occupational health and safety
dc.subject	Semi-automated coding
dc.subject	Automation
dc.subject	Risk assessment
dc.title	Human-in-the-Loop Data Analytics for Classifying Fatal Mining Accident Causes Using Natural Language Processing and Machine Learning Techniques

Collections

Journal Articles

Human-in-the-Loop Data Analytics for Classifying Fatal Mining Accident Causes Using Natural Language Processing and Machine Learning Techniques

Files

Collections