Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
6 results
Search Results
Item Feature analysis for mispronounced phonemes in the case of alvoelar approximant (/r/) substituted with voiced dental consonant (/∂/)(Institute of Electrical and Electronics Engineers Inc., 2015) Ramteke, P.B.; Koolagudi, S.G.; Prabhakar, A.Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly pronounced phoneme from corresponding mispronounced phoneme have to be compared to identify the phonological processes. This paper focuses on the analysis of mispronounced alveolar approximant (/r/) substituted with voiced fricative consonant (/∂/). In this work, spectral and pitch related features are considered for the analysis using scatter plots and histograms. From the analysis, it is observed that the energy feature against 2nd and 4th cepstral coefficients achieves 75% and 65% discrimination respectively. © 2015 IEEE.Item On-the-Fly Encryption Security in Remote Storage(Institute of Electrical and Electronics Engineers Inc., 2016) Prabhakar, A.; Savin, P.S.; Chandrasekaran, K.Development of distributed storage, cell phones, furthermore, removable hard drives have expanded the versatility of usage of information. On the other hand, there emerges a few issues of how to figure out if information may be so delicate it would be impossible leave a clients gadget, and, how to secure it from unauthorized access. Information Leakage Prevention applications perform this assignment, commonly by diverting possibly vulnerable documents to a protected distant repository, examining them, and at that point doing a last duplicate to remote storage if the output meets the constraints defined in the policy enforced, at that point the output is copied to distant repository. The extra work needed to basically serially write the document twice, once to local repository and lastly to the distant repository is the main issue with local repository isolation. This paper introduces a substitute technique for Information Leakage Prevention is introduced in this paper utilizing a transient cryptographic key. By utilizing a transient key, encoded information can be securely checked on distant repository and safely deleted in case it fails the policy while scanning. This direct procedure brings about better productivity and reduces the time delay than a local repository isolation. © 2015 IEEE.Item Diagnostic Code Group Prediction by Integrating Structured and Unstructured Clinical Data(Springer Science and Business Media Deutschland GmbH, 2021) Prabhakar, A.; Shidharth, S.; S. Krishnan, G.S.; Kamath S․, S.Diagnostic coding is a process by which written, verbal and other patient-case related documentation are used for enabling disease prediction, accurate documentation, and insurance settlements. It is a prevalently manual process even in countries that have successfully adopted Electronic Health Record (EHR) systems. The problem is exacerbated in developing countries where widespread adoption of EHR systems is still not at par with Western counterparts. EHRs contain a wealth of patient information embedded in numerical, text, and image formats. A disease prediction model that exploits all this information, enabling accurate and faster diagnosis would be quite beneficial. We address this challenging task by proposing mixed ensemble models consisting of boosting and deep learning architectures for the task of diagnostic code group prediction. The models are trained on a dataset created by integrating features from structured (lab test reports) as well as unstructured (clinical text) data. We analyze the proposed model’s performance on MIMIC-III, an open dataset of clinical data using standard multi-label metrics. Empirical evaluations underscored the significant performance of our approach for this task, compared to state-of-the-art works which rely on a single data source. Our novelty lies in effectively integrating relevant information from both data sources thereby ensuring larger ICD-9 code coverage, handling the inherent class imbalance, and adopting a novel approach to form the ensemble models. © 2021, Springer Nature Switzerland AG.Item Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping(Institute of Electrical and Electronics Engineers Inc., 2022) Prabhakar, A.; Shidharth, S.; Kamath S․, S.The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss. © 2022 IEEE.Item Commonsense and Named Entity Aware Knowledge Grounded Dialogue Generation(Association for Computational Linguistics (ACL), 2022) Varshney, D.; Prabhakar, A.; Ekbal, A.Grounding dialogue on external knowledge and interpreting linguistic patterns in dialogue history context, such as ellipsis, anaphora, and co-references is critical for dialogue comprehension and generation. In this paper, we present a novel open-domain dialogue generation model which effectively utilizes the large-scale commonsense and named entity based knowledge in addition to the unstructured topic-specific knowledge associated with each utterance. We enhance the commonsense knowledge with named entity-aware structures using co-references. Our proposed model utilizes a multi-hop attention layer to preserve the most accurate and critical parts of the dialogue history and the associated knowledge. In addition, we employ a Commonsense and Named Entity Enhanced Attention Module, which starts with the extracted triples from various sources and gradually finds the relevant supporting set of triples using multi-hop attention with the query vector obtained from the interactive dialogue-knowledge module. Empirical results on two benchmark dataset demonstrate that our model significantly outperforms the state-of-the-art methods in terms of both automatic evaluation metrics and human judgment. Our code is publicly available at https://github.com/deekshaVarshney/CNTF;https://www.iitp.ac.in/-ai-nlp-ml/resources/codes/CNTF.zip. © 2022 Association for Computational Linguistics.Item CL-NERIL: A Cross-Lingual Model for NER in Indian Languages (Student Abstract)(Association for the Advancement of Artificial Intelligence, 2022) Prabhakar, A.; Majumder, G.S.; Anand, A.Developing Named Entity Recognition (NER) systems for Indian languages has been a long-standing challenge, mainly owing to the requirement of a large amount of annotated clean training instances. This paper proposes an end-to-end framework for NER for Indian languages in a low-resource setting by exploiting parallel corpora of English and Indian languages and an English NER dataset. The proposed framework includes an annotation projection method that combines word alignment score and NER tag prediction confidence score on source language (English) data to generate weakly labeled data in a target Indian language. We employ a variant of the Teacher-Student model and optimize it jointly on the pseudo labels of the Teacher model and predictions on the generated weakly labeled data. We also present manually annotated test sets for three Indian languages: Hindi, Bengali, and Gujarati. We evaluate the performance of the proposed framework on the test sets of the three Indian languages. Empirical results show a minimum 10% performance improvement compared to the zero-shot transfer learning model on all languages. This indicates that weakly labeled data generated using the proposed annotation projection method in target Indian languages can complement well-annotated source language data to enhance performance. Our code is publicly available at https://github.com/aksh555/CL-NERIL. © © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
