Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 52
  • Item
    Hybrid text feature modeling for disease group prediction using unstructured physician notes
    (Springer Science and Business Media Deutschland GmbH, 2020) S. Krishnan, G.S.; Kamath S․, S.
    Existing Clinical Decision Support Systems (CDSSs) largely depend on the availability of structured patient data and Electronic Health Records (EHRs) to aid caregivers. However, in case of hospitals in developing countries, structured patient data formats are not widely adopted, where medical professionals still rely on clinical notes in the form of unstructured text. Such unstructured clinical notes recorded by medical personnel can also be a potential source of rich patient-specific information which can be leveraged to build CDSSs, even for hospitals in developing countries. If such unstructured clinical text can be used, the manual and time-consuming process of EHR generation will no longer be required, with huge person-hours and cost savings. In this article, we propose a generic ICD9 disease group prediction CDSS built on unstructured physician notes modeled using hybrid word embeddings. These word embeddings are used to train a deep neural network for effectively predicting ICD9 disease groups. Experimental evaluation showed that the proposed approach outperformed the state-of-the-art disease group prediction model built on structured EHRs by 15% in terms of AUROC and 40% in terms of AUPRC, thus proving our hypothesis and eliminating dependency on availability of structured patient data. © Springer Nature Switzerland AG 2020.
  • Item
    Machine Learning Techniques for the Investigation of Phishing Websites
    (Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Ajaykumar, K.B.; Rudra, B.
    Phishing is ordinarily acquainted with increase a position in an organization or administrative systems as a zone of a greater assault, similar to an advanced tireless risk (APT) occasion. An association surrendering to such a partner degree assault generally continues serious money related misfortunes furthermore to declining piece of the pie, notoriety, and customer trust. Depending on scope, a phishing attempt may step up into a security episode from that a business can have an inconvenient time recuperating. So as to locate this kind of assault, we endeavored to make a machine learning model that advises the client that it is suspicious or genuine. Phishing sites contain various indications among their substance also, web program-based information. The motivation behind this investigation is to perform different AI-based order for 30 features incorporating Phishing Websites Data in the UC Irvine AI Repository database. For results appraisal, random forest (RF) was contrasted and elective machine learning ways like linear regression (LR), support vector machine (SVM), Naive Bayes (NB), gradient boosting classifier (GBM), artificial neural network (ANN) and recognized to have the most noteworthy exactness of 97.39. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Data-Driven Stillbirth Prediction and Analysis of Risk Factors in Pregnancy
    (Springer Science and Business Media Deutschland GmbH, 2021) Unnikrishnan, A.; Chandrasekaran, K.; Shukla, A.
    One of the main issues in developing countries is the lack of policies for ensuring good public health conditions in rural areas. Maternal and child health care is one such area that has not improved in developing countries. Although child health has improved noticeably over the years, infant or under-5-mortality has not become any better. There remain major knowledge gaps in our understanding of how factors such as prenatal care, antenatal care, social and economic backgrounds, living conditions and lifestyle of pregnant women and their family members affect the pregnancy outcomes. Understanding such factors that affect the poor pregnancy outcome helps in formulating plans to prevent such issues and to treat them effectively. Determining health policies will be easier from a deeper analysis of such factors involved. This paper discusses some of the key machine learning techniques to predict the pregnancy outcome as a stillbirth or not and analyze some of the factors that majorly cause stillbirth. © 2021, Springer Nature Singapore Pte Ltd.
  • Item
    Intrusion Detection Techniques for Detection of Cyber Attacks
    (Springer Science and Business Media Deutschland GmbH, 2021) Ahmed, S.S.; Kankar, M.; Rudra, B.
    Intrusion detection system (IDS) is a software-related application where we can detect the system or network activities and notice if any suspicious task happens. Excellent broadening and the use of the Internet lift examine the communication and save the digital information securely. Nowadays, attackers use variety of attacks for fetching private data. Most of the IDS techniques, algorithms, and methods assist to find those various attacks. The central aim of the project is to come up with an overall study about the intrusion detection mechanism, various types of attacks, various tools and techniques, and challenges. We used various machine learning algorithms and found performance metrics like accuracy, recall, and F-measure and compared with the existing work. After this research, we got good results that can help to detect the cyber attacks being performed in the network. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Predicting Vaccine Hesitancy and Vaccine Sentiment Using Topic Modeling and Evolutionary Optimization
    (Springer Science and Business Media Deutschland GmbH, 2021) S. Krishnan, G.S.; Kamath S․, S.; Sugumaran, V.
    The ongoing COVID-19 pandemic has posed serious threats to the world population, affecting over 219 countries with a staggering impact of over 162 million cases and 3.36 million casualties. With the availability of multiple vaccines across the globe, framing vaccination policies for effectively inoculating a country’s population against such diseases is currently a crucial task for public health agencies. Social network users post their views and opinions on vaccines publicly and these posts can be put to good use in identifying vaccine hesitancy. In this paper, a vaccine hesitancy identification approach is proposed, built on novel text feature modeling based on evolutionary computation and topic modeling. The proposed approach was experimentally validated on two standard tweet datasets – the flu vaccine dataset and UK COVID-19 vaccine tweets. On the first dataset, the proposed approach outperformed the state-of-the-art in terms of standard metrics. The proposed model was also evaluated on the UKCOVID dataset and the results are presented in this paper, as our work is the first to benchmark a vaccine hesitancy model on this dataset. © 2021, Springer Nature Switzerland AG.
  • Item
    Towards a Federated Learning Approach for NLP Applications
    (Springer Science and Business Media Deutschland GmbH, 2021) Prabhu, O.S.; Gupta, P.K.; Shashank, P.; Chandrasekaran, K.; Divakarla, D.
    Traditional machine learning involves the collection of training data to a centralized location. This collected data is prone to misuse and data breach. Federated learning is a promising solution for reducing the possibility of misusing sensitive user data in machine learning systems. In recent years, there has been an increase in the adoption of federated learning in healthcare applications. On the other hand, personal data such as text messages and emails also contain highly sensitive data, typically used in natural language processing (NLP) applications. In this paper, we investigate the adoption of federated learning approach in the domain of NLP requiring sensitive data. For this purpose, we have developed a federated learning infrastructure that performs training on remote devices without the need to share data. We demonstrate the usability of this infrastructure for NLP by focusing on sentiment analysis. The results show that the federated learning approach trained a model with comparable test accuracy to the centralized approach. Therefore, federated learning is a viable alternative for developing NLP models to preserve the privacy of data. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    COVID-19 detection from spectral features on the DiCOVA dataset
    (International Speech Communication Association, 2021) Ritwik, K.V.S.; Kalluri, S.B.; Vijayasenan, D.
    In this paper we investigate the cues of COVID-19 on sustained phonation of Vowel-/i/, deep breathing and number counting data of the DiCOVA dataset. We use an ensemble of classifiers trained on different features, namely, super-vectors, formants, harmonics and MFCC features. We fit a two-class Weighted SVM classifier to separate the COVID-19 audio from Non-COVID-19 audio. Weighted penalties help mitigate the challenge of class imbalance in the dataset. The results are reported on the stationary (breathing, Vowel-/i/) and nonstationary( counting data) data using individual and combination of features on each type of utterance. We find that the Formant information plays a crucial role in classification. The proposed system resulted in an AUC score of 0.734 for cross validation, and 0.717 for evaluation dataset. © © 2021 ISCA.
  • Item
    Prevention of webshell attack using machine learning techniques
    (Grenze Scientific Society, 2021) Satish, Y.C.; Naik, P.M.; Rudra, B.
    Webshell is a web vulnerability and a security threat to any user or a server that can be accessed by attackers to control our system. And also, they may use our system as a command control device to attack other systems. It is difficult to monitor and identify such threats because attackers always tried to attack in different methods and new technologies. However, we can detect the webshell with Machine Learning Techniques with better accuracy; all we need is more number of samples. With this project, we presented a PHP based webshell detecting model. We used different ML algorithms: Logistic Regression(LR), Random Forest(RF), Support Vector Machine(SVM) and K-Nearest Neighbour(KNN). Addition to this PHP file's standard statistical features, we also added an opcode sequence from the PHP files, consisting of the TF-IDF Vector and the Hash Vector. Depending upon these features, we trained with different machine learning models(SVM, RF, LR, KNN). In these models, we got better results with Random Forest having an accuracy of 96.45\% with a false-positive rate of 3.5\%, which is good results compared to several popular detection techniques. © Grenze Scientific Society, 2021.
  • Item
    Data Processing in IoT, Sensor to Cloud: Survey
    (Institute of Electrical and Electronics Engineers Inc., 2021) Sandeep, M.; Chandavarkar, B.R.
    IoT is connecting Things over the Internet and the realization of the environment through smart things to create a responsive space. Many surveys predicted the growth of IoT devices is going to be around 50 billion and an average of 7 devices per person. IoT has shown promising future with its applications like smart city, connected factories, buildings, roadways, smart health and many more. To make the promise a reality IoT has to overcome many hurdles like scalability, connectivity, architectural, big data, analysis, security, and privacy. In this literature survey, an attempt has been made to identify current challenges faced by IoT implementation and possible solutions, future opportunities, and research openings. Further, the processing of sensed data at IoT device, edge/fog layer, and the cloud is discussed in detail. © 2021 IEEE.
  • Item
    Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques
    (CEUR-WS, 2021) Rajalakshmi, R.; Mattins, F.; Srivarshan, S.; Reddy, L.P.; Anand Kumar, M.
    Hate Speech is described as any form of speech in which speakers attempt to ridicule, humiliate, or inculcate hatred in someone else’s minds based on characteristics such as religion, the colour of skin, race, or sexual preference. In recent years, social networking sites have been a major source of excessive amounts of hate speech. If unaddressed, these might cause anxiety and despair in the affected individuals or groups. As a result, the above-mentioned social networks utilize an assortment of algorithms to identify such hate speech. Detecting Hate Speech in English texts has been one of the hottest topics in recent years, with multiple types of research being published. However, in regional and indigenous languages, hate speech detection is a recent area with not much research being conducted. It is difficult to perform hate speech detection using data in regional languages due to a lack of large enough training data and a lack of resources about that domain. The HASOC [1] 2021 Hate Speech Detection Task solves one of the problems. It provides a dataset containing Tweet data in English, Hindi [2] and Marathi [3] languages. There were two subtasks as part of the main task. The subtask was to classify the hate speech and offensive texts in the Hindi and Marathi tweet dataset as Hate Speech (HATE), Offensive (OFFN) or Profane (PRF). This work compares the performance of different models on both subtasks and provides a conclusion on the best performing model. The Random Forest Classifier reports the most remarkable accuracy on the first subtask with a macro F1 score of 75.19% and 73.12% on the Marathi and Hindi tweet datasets. The XGBoost algorithm is the best performing algorithm on the second subtask with a 46.5% macro F1 score. Overall any of these models can get satisfactory results when dealing with hate speech detection in regional language. This work has been submitted to the FIRE2021 shared task, Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC-2021) by team DLRG. © 2021 Copyright for this paper by its authors.