Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
11 results
Search Results
Item NITK_NLP at CheckThat! 2021: Ensemble transformer model for fake news classification(CEUR-WS, 2021) LekshmiAmmal, R.L.; Anand Kumar, M.Social media has become an inevitable part of our life as we are primarily dependent on them to get most of the news around us. However, the amount of false information propagated through it is much higher than the genuine ones, thus becoming a peril to society. In this paper, we have proposed a model for Fake News Classification as a part of CLEF2021 Checkthat! Lab1 shared task, which had Multi-class Fake News Detection and Topical Domain Classification of News Articles. We have used an ensemble model consisting of pre-trained transformer-based models that helped us achieve 4tℎ and 1st positions on the leaderboard of the two tasks. We achieved an F1-score of 0.4483 against a top score of 0.8376 in one task and a score of 0.8813 in another. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).Item A more generalizable DNN based Automatic Segmentation of Brain Tumors from Multimodal low-resolution 2D MRI(Institute of Electrical and Electronics Engineers Inc., 2021) Bhaskaracharya, B.; Nair, R.P.; Prakashini, K.; Girish Menon, R.; Litvak, P.; Mandava, P.; Vijayasenan, D.; Sumam David, S.In the field of Neuro-oncology, there is a need for improved diagnosis and prognosis of brain tumors. Brain tumor segmentation is important for treatment planning and assessing the treatment outcomes. Manual segmentation of brain tumors is tedious, time-consuming, and subjective. In this work, an efficient encoder-decoder based architectures were implemented for automatic segmentation of brain tumors from low resolution 2D images. Ensemble of the multiple architectures (EMMA) improves the performance of the brain tumor segmentation. Furthermore, the computational requirements of the proposed models are lower than that of BraTS-challenge methods. The average Fl-scores on the BraTS-challenge validation dataset for Tumor Core, Whole Tumor, and Enhancing Tumor are 0.82, 0.87, and 0.78, respectively. The average Fl-scores on the KMC-Manipal dataset for TC, WT, and ET are 0.74, 0.82, and 0.68 respectively. © 2021 IEEE.Item Machine Learning-Based Malware Detection and Classification in Encrypted TLS Traffic(Springer Science and Business Media Deutschland GmbH, 2023) Kashyap, H.; Pais, A.R.; Kondaiah, C.Malware has become a significant threat to Internet users in the modern digital era. Malware spreads quickly and poses a significant threat to cyber security. As a result, network security measures play an important role in countering these cyber threats. Existing malware detection techniques are unable to detect them effectively. A novel Ensemble Machine Learning (ML)-based malware detection technique from Transport Layer Security (TLS)-encrypted traffic without decryption is proposed in this paper. The features are extracted from TLS traffic. Based on the extracted features, malware detection is performed using Ensemble ML algorithms. The benign and malware file datasets are created using features extracted from TLS traffic. According to the experimental results, the 65 new extracted features perform well in detecting malware from encrypted traffic. The proposed method achieves an accuracy of 99.85% for random forest and 97.43% for multiclass classification for identifying malware families. The ensemble model achieved an accuracy of 99.74% for binary classification and 97.45% for multiclass classification. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Essential Climate Variables for Accurate Climate Change Impact Studies on Hydrological Regime: A Comprehensive Review(Springer Science and Business Media Deutschland GmbH, 2025) Avinash, R.; Dwarakish, G.S.Essential Climate Variables (ECVs) are a set of key climate variables, identified by the Global Climate Observing System (GCOS) for understanding the climate system. Understanding the interactions between ECVs is critical for accurately modelling and predicting the impacts of climate change. In addition to the direct impacts of changes in ECVs, there are also complex feedback mechanisms that can either amplify or dampen the effects of climate forcings at a larger scale. While many researchers may use slightly different sets ECVs, some significant climate variables that are sometimes neglected, which can lead to underestimating or overestimating the impacts and introduce significant uncertainties in the results. This review paper aims to provide an overview of the climate system, ECVs, climate feedbacks, climate forcings, and methodologies for identifying the significant ECVs for climate change impact studies on hydrological regimes and to understand the uncertainties introduced due to neglecting few ECVs. A comprehensive review was conducted, consisting of an in-depth analysis of approximately 80 relevant scholarly articles. Methodologies for identifying the ECVs are discussed which involve a multi-disciplinary approach that combines empirical analysis, climate model simulations, and ensemble approach. It can be inferred that ECVs that have a significant impact on hydrological regimes, such as groundwater, soil moisture, atmospheric humidity, etc. are often excluded from analysis due to a lack of available spatial and temporal data. In conclusion, improving the availability and accessibility of data for important ECVs is imperative, and can be achieved through investing in new monitoring methods and technologies, enhancing data sharing and collaboration among institutions and researchers, and prioritizing funding for data collection and analysis. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.Item PhishDump: A multi-model ensemble based technique for the detection of phishing sites in mobile devices(Elsevier B.V., 2019) Rao, R.S.; Vaishnavi, T.; Pais, A.R.Phishing is a technique in which the attackers trick the online users to reveal the sensitive information by creating the phishing sites which look similar to that of legitimate sites. There exist many techniques to detect phishing sites in desktop computers. In recent years, the number of mobile users accessing the web has increased which lead to a rise in the number of attacks in mobile devices. Existing techniques designed for desktop computers may not be suitable for mobile devices due to their hardware limitations such as RAM, Screen size, low computational power etc. In this paper, we propose a mobile application named PhishDump to classify the legitimate and phishing websites in mobile devices. PhishDump is based on the multi-model ensemble of Long Short Term Memory (LSTM) and Support Vector Machine (SVM) classifier. As PhishDump focuses on the extraction of features from URL, it has several advantages over existing works such as fast computation, language independence and robust to accidental download of malwares. From the experimental analysis, we observed that our proposed multi-model ensemble outperformed traditional LSTM character and word-level models. PhishDump performed better than the existing baseline models with an accuracy of 97.30% on our dataset and 98.50% on the benchmark dataset. © 2019 Elsevier B.V.Item Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach(Springer, 2020) Rao, R.S.; Pais, A.R.The visual similarity-based techniques detect the phishing sites based on the similarity between the suspicious site and the existing database of resources such as screenshots, styles, logos, favicons etc. These techniques fail to detect phishing sites which target non-whitelisted legitimate domain or when phishing site with manipulated whitelisted legitimate content is encountered. Also, these techniques are not well adaptable at the client-side due to their computation and space complexity. Thus there is a need for light weight visual similarity-based technique detecting phishing sites targeting non-whitelisted legitimate resources. Unlike traditional visual similarity-based techniques using whitelists, in this paper, we employed a light-weight visual similarity based blacklist approach as a first level filter for the detection of near duplicate phishing sites. For the non-blacklisted phishing sites, we have incorporated a heuristic mechanism as a second level filter. We used two fuzzy similarity measures, Simhash and Perceptual hash for calculating the similarity score between the suspicious site and existing blacklisted phishing sites. Each similarity measure generates a unique fingerprint for a given website and also differs with less number of bits with a similar website. All three fingerprints together represent a website which undergoes blacklist filtering for the identification of the target website. The phishing sites which bypassed from the first level filter undergo second level heuristic filtering. We used comprehensive heuristic features including URL and source code based features for the detection of non-blacklisted phishing sites. The experimental results demonstrate that the blacklist filter alone is able to detect 55.58% of phishing sites which are either replicas or near duplicates of existing phishing sites. We also proposed an ensemble model with Random Forest (RF), Extra-Tree and XGBoost to evaluate the contribution of both blacklist and heuristic filters together as an entity and the model achieved a significant accuracy of 98.72% and Matthews Correlation Coefficient (MCC) of 97.39%. The proposed model is deployed as a chrome extension named as BlackPhish to provide real time protection against phishing sites at the client side. We also compared BlackPhish with the existing anti-phishing techniques where it outperformed existing works with a significant difference in accuracy and MCC. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.Item Enhanced streamflow simulations using nudging based optimization coupled with data-driven and hydrological models(Elsevier B.V., 2022) Sharannya, S.; Venkatesh, V.; Mahesha, M.; Acharya, T.D.Study region: Varahi River originating from the Western Ghats of India. Study focus: We developed a hybrid model that integrates process-based hydrological model (PHM) and data-driven (DD) techniques to generate streamflow simulations precisely. The hybrid modeling framework is practical as it respects hydrological processes through the PHM while considering the advantage of the DD model's ability to simulate the complex relationship between residuals and input variables. Further, we have proposed an optimization-based nudging scheme for post-processing the hybrid model simulated streamflow to overcome the limitations in PHM and DD. New hydrological insights for the region: We formulated two approaches for simulating streamflow ensembles using DD and PHM models. In approach− 1, DD models are initially used to ensemble meteorological variables and then use the ensembles in a PHM to simulate streamflows. In approach− 2, PHM is forced with different sets of meteorological variables to simulate multiple streamflow sets and then use DD models to ensemble the PHM-derived streamflows. Random forest exhibited better performance for ensembling precipitation, temperature, and streamflow datasets compared to the other five DD algorithms in the study. Streamflows generated using approach− 2 showed reliable estimates when compared against observed streamflow values. However, post-processing the hybrid streamflows using an optimization-based nudging scheme outperformed the streamflows generated in approach− 1 and approach− 2 with better model fit statistics (R2 and NSE of 0.69 and 0.66). The output from the nudging scheme was further utilized for streamflow predictions under the combined impact of land use/cover (LULC) and climate change (CC) under the Representative Concentration Pathway 4.5 scenario. It depicted a decrease in monthly and seasonal stream flows with − 22.65 %, − 31.77 %, − 11.81 % for winter, summer, and monsoon seasons, respectively. These results suggest that water availability will decline, and water scarcity will increase in the study region. These variations in streamflow might negatively impact agriculture and natural ecosystems and even lead to water restrictions in the region. © 2022 The AuthorsItem A Boosting-Based Hybrid Feature Selection and Multi-Layer Stacked Ensemble Learning Model to Detect Phishing Websites(Institute of Electrical and Electronics Engineers Inc., 2023) Lakshmana Rao, L.R.; Rao, R.S.; Pais, A.R.; Gabralla, L.A.Phishing is a type of online scam where the attacker tries to trick you into giving away your personal information, such as passwords or credit card details, by posing as a trustworthy entity like a bank, email provider, or social media site. These attacks have been around for a long time and unfortunately, they continue to be a common threat. In this paper, we propose a boosting based multi layer stacked ensemble learning model that uses hybrid feature selection technique to select the relevant features for the classification. The dataset with selected features are sent to various classifiers at different layers where the predictions of lower layers are fed as input to the upper layers for the phishing detection. From the experimental analysis, it is observed that the proposed model achieved an accuracy ranging from 96.16 to 98.95% without feature selection across different datasets and also achieved an accuracy ranging from 96.18 to 98.80% with feature selection. The proposed model is compared with baseline models and it has outperformed the existing models with a significant difference. © 2013 IEEE.Item Hindi fake news detection using transformer ensembles(Elsevier Ltd, 2023) Praseed, A.; Rodrigues, J.; Santhi Thilagam, P.S.In the past few decades, due to the growth of social networking sites such as Whatsapp and Facebook, information distribution has been at a level never seen before. Knowing the integrity of information has been a long-standing problem, even more so for the regional languages. Regional languages, such as Hindi, raise challenging problems for fake news detection as they tend to be resource constrained. This limits the amount of data available to efficiently train models for these languages. Most of the existing techniques to detect fake news is targeted towards the English language or involves the manual translation of the language to the English language and then proceeding with Deep Learning methods. Pre-trained transformer based models such as BERT are fine-tuned for the task of fake news detection and are commonly employed for detecting fake news. Other pre-trained transformer models, such as ELECTRA and RoBERTa have also been shown to be able to detect fake news in multiple languages after suitable fine-tuning. In this work, we propose a method for detecting fake news in resource constrained languages such as Hindi more efficiently by using an ensemble of pre-trained transformer models, all of which are individually fine-tuned for the task of fake news detection. We demonstrate that the use of such a transformer ensemble consisting of XLM-RoBERTa, mBERT and ELECTRA is able to improve the efficiency of fake news detection in Hindi by overcoming the drawbacks of individual transformer models. © 2022 Elsevier LtdItem Enhanced Malicious Traffic Detection in Encrypted Communication Using TLS Features and a Multi-class Classifier Ensemble(Springer, 2024) Kondaiah, C.; Pais, A.R.; Rao, R.S.The use of encryption for network communication leads to a significant challenge in identifying malicious traffic. The existing malicious traffic detection techniques fail to identify malicious traffic from the encrypted traffic without decryption. The current research focuses on feature extraction and malicious traffic classification from the encrypted network traffic without decryption. In this paper, we propose an ensemble model using Deep Learning (DL), Machine Learning (ML), and self-attention-based methods. Also, we propose novel TLS features extracted from the network and perform experimentation on the ensemble model. The experimental results demonstrated that the ML-based (RF, LGBM, XGB) ensemble model achieved a significant accuracy of 94.85% whereas the other ensemble model using RF, LSTM, and Bi-LSTM with self-attention technique achieved an accuracy of 96.71%. To evaluate the efficacy of our proposed models, we curated datasets encompassing both phishing, legitimate and malware websites, leveraging features extracted from TLS 1.2 and 1.3 traffic without decryption. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
