Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data

dc.contributor.authorPrabhakar, A.
dc.contributor.authorShidharth, S.
dc.contributor.authorS. Krishnan, G.S.
dc.contributor.authorKamath S․, S.
dc.date.accessioned2026-02-06T06:36:10Z
dc.date.issued2021
dc.description.abstractDiagnostic coding is a process by which written, verbal and other patient-case related documentation are used for enabling disease prediction, accurate documentation, and insurance settlements. It is a prevalently manual process even in countries that have successfully adopted Electronic Health Record (EHR) systems. The problem is exacerbated in developing countries where widespread adoption of EHR systems is still not at par with Western counterparts. EHRs contain a wealth of patient information embedded in numerical, text, and image formats. A disease prediction model that exploits all this information, enabling accurate and faster diagnosis would be quite beneficial. We address this challenging task by proposing mixed ensemble models consisting of boosting and deep learning architectures for the task of diagnostic code group prediction. The models are trained on a dataset created by integrating features from structured (lab test reports) as well as unstructured (clinical text) data. We analyze the proposed model’s performance on MIMIC-III, an open dataset of clinical data using standard multi-label metrics. Empirical evaluations underscored the significant performance of our approach for this task, compared to state-of-the-art works which rely on a single data source. Our novelty lies in effectively integrating relevant information from both data sources thereby ensuring larger ICD-9 code coverage, handling the inherent class imbalance, and adopting a novel approach to form the ensemble models. © 2021, Springer Nature Switzerland AG.
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, Vol.13147 LNCS, , p. 197-210
dc.identifier.issn3029743
dc.identifier.urihttps://doi.org/10.1007/978-3-030-93620-4_15
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/30305
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.subjectClinical decision support systems
dc.subjectDisease prediction
dc.subjectHealthcare informatics
dc.titleDiagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data

Files