Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Jail-Phish: An improved search engine based phishing detection system
    (Elsevier Ltd, 2019) Rao, R.S.; Pais, A.R.
    Stealing of sensitive information (username, password, credit card information and social security number, etc.) using a fake webpage that imitates trusted website is termed as phishing. Recent techniques use search engine based approach to counter the phishing attacks as it achieves promising detection accuracy. But, the limitation of this approach is that it fails when phishing page is hosted on compromised server. Moreover, it also results in low true negative rate when newly registered or non-popular domains are encountered. Hence, in this paper, we propose an application named as Jail-Phish, which improves the accuracy of the search engine based techniques with an ability to detect the Phishing Sites Hosted on Compromised Servers (PSHCS) and also detection of newly registered legitimate sites. Jail-Phish compares the suspicious site and matched domain in the search results for calculating the similarity score between them. There exists some degree of similarity such as logos, favicons, images, scripts, styles, and anchorlinks within the pages of the same website whereas on the other side, the dissimilarity within the pages is very high in PSHCS. Hence, we use the similarity score between the suspicious site and matched domain as a parameter to detect the PSHCS. From the experimental results, it is observed that Jail-Phish achieved an accuracy of 98.61%, true positive rate of 97.77% and false positive rate less than 0.64%. © 2019 Elsevier Ltd
  • Item
    Detection of phishing websites using an efficient feature-based machine learning framework
    (Springer London, 2019) Rao, R.S.; Pais, A.R.
    Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks. © 2018, The Natural Computing Applications Forum.
  • Item
    PhishDump: A multi-model ensemble based technique for the detection of phishing sites in mobile devices
    (Elsevier B.V., 2019) Rao, R.S.; Vaishnavi, T.; Pais, A.R.
    Phishing is a technique in which the attackers trick the online users to reveal the sensitive information by creating the phishing sites which look similar to that of legitimate sites. There exist many techniques to detect phishing sites in desktop computers. In recent years, the number of mobile users accessing the web has increased which lead to a rise in the number of attacks in mobile devices. Existing techniques designed for desktop computers may not be suitable for mobile devices due to their hardware limitations such as RAM, Screen size, low computational power etc. In this paper, we propose a mobile application named PhishDump to classify the legitimate and phishing websites in mobile devices. PhishDump is based on the multi-model ensemble of Long Short Term Memory (LSTM) and Support Vector Machine (SVM) classifier. As PhishDump focuses on the extraction of features from URL, it has several advantages over existing works such as fast computation, language independence and robust to accidental download of malwares. From the experimental analysis, we observed that our proposed multi-model ensemble outperformed traditional LSTM character and word-level models. PhishDump performed better than the existing baseline models with an accuracy of 97.30% on our dataset and 98.50% on the benchmark dataset. © 2019 Elsevier B.V.
  • Item
    CatchPhish: detection of phishing websites by inspecting URLs
    (Springer, 2020) Rao, R.S.; Vaishnavi, T.; Pais, A.R.
    There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
  • Item
    Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach
    (Springer, 2020) Rao, R.S.; Pais, A.R.
    The visual similarity-based techniques detect the phishing sites based on the similarity between the suspicious site and the existing database of resources such as screenshots, styles, logos, favicons etc. These techniques fail to detect phishing sites which target non-whitelisted legitimate domain or when phishing site with manipulated whitelisted legitimate content is encountered. Also, these techniques are not well adaptable at the client-side due to their computation and space complexity. Thus there is a need for light weight visual similarity-based technique detecting phishing sites targeting non-whitelisted legitimate resources. Unlike traditional visual similarity-based techniques using whitelists, in this paper, we employed a light-weight visual similarity based blacklist approach as a first level filter for the detection of near duplicate phishing sites. For the non-blacklisted phishing sites, we have incorporated a heuristic mechanism as a second level filter. We used two fuzzy similarity measures, Simhash and Perceptual hash for calculating the similarity score between the suspicious site and existing blacklisted phishing sites. Each similarity measure generates a unique fingerprint for a given website and also differs with less number of bits with a similar website. All three fingerprints together represent a website which undergoes blacklist filtering for the identification of the target website. The phishing sites which bypassed from the first level filter undergo second level heuristic filtering. We used comprehensive heuristic features including URL and source code based features for the detection of non-blacklisted phishing sites. The experimental results demonstrate that the blacklist filter alone is able to detect 55.58% of phishing sites which are either replicas or near duplicates of existing phishing sites. We also proposed an ensemble model with Random Forest (RF), Extra-Tree and XGBoost to evaluate the contribution of both blacklist and heuristic filters together as an entity and the model achieved a significant accuracy of 98.72% and Matthews Correlation Coefficient (MCC) of 97.39%. The proposed model is deployed as a chrome extension named as BlackPhish to provide real time protection against phishing sites at the client side. We also compared BlackPhish with the existing anti-phishing techniques where it outperformed existing works with a significant difference in accuracy and MCC. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
  • Item
    Efficient deep learning techniques for the detection of phishing websites
    (Springer, 2020) Somesha, M.; Pais, A.R.; Rao, R.S.; Rathour, V.S.
    Phishing is a fraudulent practice and a form of cyber-attack designed and executed with the sole purpose of gathering sensitive information by masquerading the genuine websites. Phishers fool users by replicating the original and genuine contents to reveal personal information such as security number, credit card number, password, etc. There are many anti-phishing techniques such as blacklist- or whitelist-, heuristic-feature- and visual-similarity-based methods proposed as of today. Modern browsers adapt to reduce the chances of users getting trapped into a vicious agenda, but still users fall as prey to phishers and end up revealing their secret information. In a previous work, the authors proposed a machine learning approach based on heuristic features for phishing website detection and achieved an accuracy of 99.5% using 18 features. In this paper, we have proposed novel phishing URL detection models using (a) Deep Neural Network (DNN), (b) Long Short-Term Memory (LSTM) and (c) Convolution Neural Network (CNN) using only 10 features of our earlier work. The proposed technique achieves an accuracy of 99.52% for DNN, 99.57% for LSTM and 99.43% for CNN. The proposed techniques utilize only one third-party service feature, thus making it more robust to failure and increases the speed of phishing detection. © 2020, Indian Academy of Sciences.
  • Item
    A heuristic technique to detect phishing websites using TWSVM classifier
    (Springer Science and Business Media Deutschland GmbH, 2021) Rao, R.S.; Pais, A.R.; Anand, P.
    Phishing websites are on the rise and are hosted on compromised domains such that legitimate behavior is embedded into the designed phishing site to overcome the detection. The traditional heuristic techniques using HTTPS, search engine, Page Ranking and WHOIS information may fail in detecting phishing sites hosted on the compromised domain. Moreover, list-based techniques fail to detect phishing sites when the target website is not in the whitelisted data. In this paper, we propose a novel heuristic technique using TWSVM to detect malicious registered phishing sites and also sites which are hosted on compromised servers, to overcome the aforementioned limitations. Our technique detects the phishing websites hosted on compromised domains by comparing the log-in page and home page of the visiting website. The hyperlink and URL-based features are used to detect phishing sites which are maliciously registered. We have used different versions of support vector machines (SVMs) for the classification of phishing websites. We found that twin support vector machine classifier (TWSVM) outperformed the other versions with a significant accuracy of 98.05% and recall of 98.33%. © 2020, Springer-Verlag London Ltd., part of Springer Nature.
  • Item
    Application of word embedding and machine learning in detecting phishing websites
    (Springer, 2022) Rao, R.S.; Umarekar, A.; Pais, A.R.
    Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68% © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    TrackPhish: A Multi-Embedding Attention-Enhanced 1D CNN Model for Phishing URL Detection
    (Institute of Electrical and Electronics Engineers Inc., 2025) Kondaiah, C.; Pais, A.R.; Rao, R.S.
    Phishing attacks are a growing threat to online security, with increasingly sophisticated and frequent tactics. This rise in cyber threats underscores the need for advanced detection methods. While the Internet is crucial for modern communication and commerce, it also exposes users to risks such as phishing, spamming, malware, and performance degradation attacks. Among these, malicious URLs, commonly embedded in static links within emails and websites, are a significant challenge in identifying and mitigating these attacks. This study proposes TrackPhish, a novel lightweight application that predicts URL legitimacy without visiting the associated website. The proposed model combines traditional word embeddings (Word2Vec, FastText, GloVe) with transformer models (BERT, RoBERTa, GPT-2) to create a comprehensive feature set fed into a Deep Learning (DL) model for detecting phishing URLs. The integration of these embeddings captures semantic relationships and contextual understanding of the text, generating a robust feature set enhanced by an attention mechanism to choose relevant features. The refined features are then used to train a One-Dimensional Convolutional Neural Network (1D CNN) model for phishing URL detection. The proposed model offers key advantages over existing methods, including independence from third-party features, adaptability for client-side deployment, and target-independent detection. Experimental results demonstrate the model’s effectiveness, achieving 95.41% accuracy with a low false positive rate of 1.44% on our dataset and an impressive 98.55% accuracy on benchmark datasets, outperforming existing baseline models. The proposed model represents a significant advancement over traditional methods, enhancing online security against phishing URLs. © 2005-2012 IEEE.