Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 4 of 4
  • Item
    Application of word embedding and machine learning in detecting phishing websites
    (Springer, 2022) Rao, R.S.; Umarekar, A.; Pais, A.R.
    Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68% © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Classification of Phishing Email Using Word Embedding and Machine Learning Techniques
    (River Publishers, 2022) Somesha, M.; Pais, A.R.
    Email phishing is a cyber-attack, bringing substantial financial damage to corporate and commercial organizations. A phishing email is a special type of spamming, used to trick the user to disclose personal information to access his digital assets. Phishing attack is generally triggered by emailing links to spoofed websites that collect sensitive information. The APWG survey suggests that the existing countermeasures remain ineffective and insufficient for detecting phishing attacks. Hence there is a need for an efficient mechanism to detect phishing emails to provide better security against such attacks to the common user. The existing open-source data sets are limited in diversity, hence they do not capture the real picture of the attack. Hence there is a need for real-time input data set to design accurate email anti-phishing solutions. In the current work, it has been created a real-time in-house corpus of phishing and legitimate emails and proposed efficient techniques to detect phishing emails using a word embedding and machine learning algorithms. The proposed system uses only four email header-based heuristics for the classification of emails. The proposed word embedding cum machine learning framework comprises six word embedding techniques with five machine learning classifiers to evaluate the best performing combination. Among all six combinations, Random Forest consistently performed the best with FastText (CBOW) by achieving an accuracy of 99.50% with a false positive rate of 0.053%, TF-IDF achieved an accuracy of 99.39% with a false positive rate of 0.4% and Count Vectorizer achieved an accuracy of 99.18% with a false positive rate of 0.98% respectively for three datasets used. © 2022 River Publishers.
  • Item
    DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithms
    (Springer, 2024) Somesha, M.; Pais, A.R.
    Email phishing is a social engineering scheme that uses spoofed emails intended to trick the user into disclosing legitimate business and personal credentials. Many phishing email detection techniques exist based on machine learning, deep learning, and word embedding. In this paper, we propose a new technique for the detection of phishing emails using word embedding (Word2Vec, FastText, and TF-IDF) and deep learning techniques (DNN and BiLSTM network). Our proposed technique makes use of only four header based (From, Returnpath, Subject, Message-ID) features of the emails for the email classification. We applied several word embeddings for the evaluation of our models. From the experimental evaluation, we observed that the DNN model with FastText-SkipGram achieved an accuracy of 99.52% and BiLSTM model with FastText-SkipGram achieved an accuracy of 99.42%. Among these two techniques, DNN outperformed BiLSTM using the same word embedding (FastText-SkipGram) techniques with an accuracy of 99.52%. © Indian Academy of Sciences 2024.
  • Item
    TrackPhish: A Multi-Embedding Attention-Enhanced 1D CNN Model for Phishing URL Detection
    (Institute of Electrical and Electronics Engineers Inc., 2025) Kondaiah, C.; Pais, A.R.; Rao, R.S.
    Phishing attacks are a growing threat to online security, with increasingly sophisticated and frequent tactics. This rise in cyber threats underscores the need for advanced detection methods. While the Internet is crucial for modern communication and commerce, it also exposes users to risks such as phishing, spamming, malware, and performance degradation attacks. Among these, malicious URLs, commonly embedded in static links within emails and websites, are a significant challenge in identifying and mitigating these attacks. This study proposes TrackPhish, a novel lightweight application that predicts URL legitimacy without visiting the associated website. The proposed model combines traditional word embeddings (Word2Vec, FastText, GloVe) with transformer models (BERT, RoBERTa, GPT-2) to create a comprehensive feature set fed into a Deep Learning (DL) model for detecting phishing URLs. The integration of these embeddings captures semantic relationships and contextual understanding of the text, generating a robust feature set enhanced by an attention mechanism to choose relevant features. The refined features are then used to train a One-Dimensional Convolutional Neural Network (1D CNN) model for phishing URL detection. The proposed model offers key advantages over existing methods, including independence from third-party features, adaptability for client-side deployment, and target-independent detection. Experimental results demonstrate the model’s effectiveness, achieving 95.41% accuracy with a low false positive rate of 1.44% on our dataset and an impressive 98.55% accuracy on benchmark datasets, outperforming existing baseline models. The proposed model represents a significant advancement over traditional methods, enhancing online security against phishing URLs. © 2005-2012 IEEE.