Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
15 results
Search Results
Item A personalized recommender system using Machine Learning based Sentiment Analysis over social data(Institute of Electrical and Electronics Engineers Inc., 2016) Ashok, M.; Rajanna, S.; Joshi, P.V.; Kamath S․, S.S.Social Media platforms are already an indispensable part of our daily lives. With its constant growth, it has contributed to superfluous, heterogeneous data which can be overwhelming due to its volume and velocity, thus limiting the availability of relevant and required information when a particular query is to be served. Hence, a need for personalized, fine-grained user preference-oriented framework for resolving this problem and also, to enhance user experience is increasingly felt. In this paper, we propose a such a social framework, which extracts user's reviews, comments of restaurants and points of interest such as events and locations, to personalize and rank suggestions based on user preferences. Machine Learning and Sentiment Analysis based techniques are used for further optimizing search query results. This provides the user with quicker and more relevant data, thus avoiding irrelevant data and providing much needed personalization. © 2016 IEEE.Item TAGS: Towards Automated Classification of Unstructured Clinical Nursing Notes(Springer Verlag service@springer.de, 2019) Gangavarapu, T.; Jayasimha, A.; S. Krishnan, G.S.; Kamath S․, S.K.Accurate risk management and disease prediction are vital in intensive care units to channel prompt care to patients in critical conditions and aid medical personnel in effective decision making. Clinical nursing notes document subjective assessments and crucial information of a patient’s state, which is mostly lost when transcribed into Electronic Medical Records (EMRs). The Clinical Decision Support Systems (CDSSs) in the existing body of literature are heavily dependent on the structured nature of EMRs. Moreover, works which aim at benchmarking deep learning models are limited. In this paper, we aim at leveraging the underutilized treasure-trove of patient-specific information present in the unstructured clinical nursing notes towards the development of CDSSs. We present a fuzzy token-based similarity approach to aggregate voluminous clinical documentations of a patient. To structure the free-text in the unstructured notes, vector space and coherence-based topic modeling approaches that capture the syntactic and latent semantic information are presented. Furthermore, we utilize the predictive capabilities of deep neural architectures for disease prediction as ICD-9 code group. Experimental validation revealed that the proposed Term weighting of nursing notes AGgregated using Similarity (TAGS) model outperformed the state-of-the-art model by 5% in AUPRC and 1.55% in AUROC. © 2019, Springer Nature Switzerland AG.Item Deep neural learning for automated diagnostic code group prediction using unstructured nursing notes(Association for Computing Machinery, 2020) Jayasimha, A.; Gangavarapu, T.; Kamath S․, S.; S. Krishnan, G.S.Disease prediction, a central problem in clinical care and management, has gained much significance over the last decade. Nursing notes documented by caregivers contain valuable information concerning a patient's state, which can aid in the development of intelligent clinical prediction systems. Moreover, due to the limited adaptation of structured electronic health records in developing countries, the need for disease prediction from such clinical text has garnered substantial interest from the research community. The availability of large, publicly available databases such as MIMIC-III, and advancements in machine and deep learning models with high predictive capabilities have further facilitated research in this direction. In this work, we model the latent knowledge embedded in the unstructured clinical nursing notes, to address the clinical task of disease prediction as a multi-label classification of ICD-9 code groups. We present EnTAGS, which facilitates aggregation of the data in the clinical nursing notes of a patient, by modeling them independent of one another. To handle the sparsity and high dimensionality of clinical nursing notes effectively, our proposed EnTAGS is built on the topics extracted using Non-negative matrix factorization. Furthermore, we explore the applicability of deep learning models for the clinical task of disease prediction, and assess the reliability of the proposed models using standard evaluation metrics. Our experimental evaluation revealed that the proposed approach consistently exceeded the state-of-the-art prediction model by 1.87% in accuracy, 12.68% in AUPRC, and 11.64% in MCC score. © 2020 Association for Computing Machinery.Item Detecting Semantic Similarity of Documents Using Natural Language Processing(Elsevier B.V., 2021) Agarwala, S.; Anagawadi, A.; Reddy Guddeti, R.M.The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. © 2021 Elsevier B.V.. All rights reserved.Item Natural Language Inference: Detecting Contradiction and Entailment in Multilingual Text(Springer Science and Business Media Deutschland GmbH, 2021) Sree Harsha, S.; Krishna Swaroop, K.; Chandavarkar, B.R.Natural Language Inference (NLI) is the task of characterising the inferential relationship between a natural language premise and a natural language hypothesis. The premise and the hypothesis could be related in three distinct ways. The hypothesis could be a logical conclusion that follows from the given premise (entailment), the hypothesis could be false (contradiction), or the hypothesis and the premise could be unrelated (neutral). A robust and reliable system for NLI serves as a suitable evaluation measure for true natural language understanding and enables the use of such systems in several modern day application scenarios. We propose a novel technique for the NLI task by leveraging the recently proposed Bidirectional Encoder Representations from Transformers (BERT). We utilize a robustly optimized variant of BERT, integrate a contextualized definition embedding mechanism, and incorporate the use of global average pooling into our proposed NLI system. We use several different benchmark datasets, including a dataset containing premise-hypothesis pairs from 15 different languages to systematically evaluate the performance of our model and show that it yields superior results. © 2021, Springer Nature Switzerland AG.Item Conversational Hate-Offensive detection in Code-Mixed Hindi-English Tweets(CEUR-WS, 2021) Rajalakshmi, R.; Srivarshan, S.; Mattins, F.; Kaarthik, E.; Seshadri, P.; Anand Kumar, M.Hate speech in social media has increased due to the increased use of online forums for sharing the opinion among the people. Especially, people prefer expressing the views in their native language while posting such objectionable contents in many social media platforms. It is a challenging task to have an automated system to identify such hate and offensive tweets in many regional languages due to the rich linguistics nature. Recently, this problem has become too complicated, due to the use of multi-lingual and code-mixed tweets. The code-mixed data includes the mixing of two languages on the granular level. A word that might not be a part of either language may be found in the data. To address the above challenges in Hindi-English tweets, we propose an efficient method by combining the IndicBERT with an effective ensemble based method. We have applied different methodologies to find a way to accurately classify whether the given tweet is considered to be Hate Speech or Not in code-mixed Hinglish dataset. Three different models namely, IndicBERT, XLM Roberta and Masked LM were used to embed the tweet data. Then various classification methods such as Logistic Regression, Support Vector Machine, Ensembling and Neural Networks based method were applied to perform classification. From extensive experiments on the data set, embedding the code-mixed data with IndicBERT and Ensembling was found to be the best method, which resulted in an macro F1-score of 62.53%. This work was submitted to the shared task of the HASOC 2021 [1] [2] Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Competition by team TNLP. © 2021 Copyright for this paper by its authors.Item Generating Short Video Description using Deep-LSTM and Attention Mechanism(Institute of Electrical and Electronics Engineers Inc., 2021) Yadav, N.; Naik, D.In modern days, extensive amount of data is produced from videos, because most of the populations have video capturing devices such as mobile phone, camera, etc. The video comprises of photographic data, textual data, and auditory data. Our aim is to investigate and recognize the visual feature of the video and to generate the caption so that users can get the information of the video in an instant of time. Many technologies capture static content of the frame but for video captioning, dynamic information is more important compared to static information. In this work, we introduced an Encoder-Decoder architecture using Deep-Long Short-Term Memory (Deep-LSTM) and Bahdanau Attention. In the encoder, Convolution Neural Network (CNN) VGG16 and Deep-LSTM are used for deducing information from frames and Deep-LSTM combined with attention mechanism for describing action performed in the video. We evaluated the performance of our model on MSVD dataset, which shows significant improvement as compared to the other video captioning model. © 2021 IEEE.Item Early detection of depression using BERT and DeBERTa(CEUR-WS, 2022) Devaguptam, S.; Kogatam, T.; Kotian, N.; Anand Kumar, A.M.In today’s world, social media usage has become one of the most fundamental human activities. On the report of Oberlo, at present, 3.2 billion people are on social media, which comprises 42% of the World’s population. People usually post about their daily life style, special occasions, views about on-going issues and their networks on the social media platforms. People also share things on social media which otherwise would not have shared with other people. Social media helps us to stay connected, keep informed, mobilise on social issues. Due to the surge of suicide attempts, social media can act as a life saver in detecting and tracing users who are on the verge of depression and self-harm. Natural language processing methods with the help of deep learning are aiding in solving language/text related real world problems like sentiment analysis, translation of text into different languages, depression detection. Many transformer based models like BERT (Bidirectional Encoders Representations from Transformers) are put to use to solve NLP problems, which voluntarily learns to attend to different features differently (Weighing). In this paper, a supervised machine learning algorithm with transfer learning approach is used to detect self-harm tendency in the social media users at the earliest. © 2022 Copyright for this paper by its authors.Item Fake News Detection in Hindi Using Embedding Techniques(Institute of Electrical and Electronics Engineers Inc., 2022) Shailendra, P.; Rashmi, M.; Ramu, S.; Guddeti, R.M.R.Internet users have been rapidly increasing in recent years, especially in India. That is why nearly everything operates in an online mode. Sharing information has also become simple and easy due to the internet and social media. Almost everyone now shares news in the community without even considering the source of information. As a result, there is the issue of disseminating false, misleading, or fabricated data. Detecting fake news is a challenging task because it is presented in such a form that it looks like authentic information. This problem becomes more challenging when it comes to local languages. This paper discusses several deep learning models that utilize LSTM, BiLSTM, CNN+LSTM, and CNN+BiLSTM. On the Hostility detection dataset in Hindi, these models use Word2Vec, IndicNLP fastText, and Facebook's fastText embeddings for fake news detection. The proposed CNN+BiLSTM model with Facebook's fastText embedding achieved an F1-score of 75%, outperforming the baseline model. Additionally, the BiLSTM using Facebook's fastText outperforms CNN+BiLSTM using Facebook's fastText on the F1-score. © 2022 IEEE.Item Misinformation Detection Through Authentication of Content Creators(Springer, 2023) KSudhama, K.; Siddamsetti, S.G.; G, P.; Chandavarkar, B.R.Recent technological advancements have made content modification and recreation easier and practically undetectable without suitable verification techniques. Users can change data from social media with photo, video, and text editing tools and share the updated content in a different context. As a result, online social media platforms are suitable for distributing fake news and misinformation. Misinformation can take several forms, including one or more types of multimedia, such as text, photos, videos. The modified contents provide fake evidence to the user, leading to various misconceptions. Generally, fake news has eye-catching headlines attracting the readers. These are called click-baits. The content of these click-baits often differs from what the headlines suggest. There are also many fake websites whose IP addresses are slightly modified versions of popular news agencies. Users easily get fooled to open the website as its address seems legitimate. These issues indicate the importance of identifying legitimate content creators from non-legitimate ones. This chapter focuses on authenticating the legitimate content creators verified by a trusted entity using certificates and blockchain technology. Also, check for fakeness in their content using natural language processing and image processing techniques. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
