Faculty Publications

Now showing 1 - 4 of 4

Detection of phishing websites using an efficient feature-based machine learning framework
(Springer London, 2019) Rao, R.S.; Pais, A.R.
Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks. © 2018, The Natural Computing Applications Forum.
CatchPhish: detection of phishing websites by inspecting URLs
(Springer, 2020) Rao, R.S.; Vaishnavi, T.; Pais, A.R.
There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
Application of word embedding and machine learning in detecting phishing websites
(Springer, 2022) Rao, R.S.; Umarekar, A.; Pais, A.R.
Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68% © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
An ensemble learning approach for detecting phishing URLs in encrypted TLS traffic
(Springer, 2024) Kondaiah, C.; Pais, A.R.; Rao, R.S.
Phishing is a fraudulent method used by hackers to acquire confidential data from victims, including security passwords, bank account details, debit card data, and other sensitive data. Owing to the increase in internet users, the corresponding network attacks have also grown over the last decade. Existing phishing detection methods are implemented for the application layer and are not effectively adapted to the transport layer. In this paper, we propose a novel phishing detection method that extends beyond traditional approaches by utilizing a multi-model ensemble of deep neural networks, long short term memory, and Random Forest classifiers. Our approach is distinguished by its unique feature extraction from transport layer security (TLS) 1.2 and 1.3 network traffic and the application of advanced deep learning algorithms to enhance phishing detection capabilities. To assess the effectiveness of our model, we curated datasets that include both phishing and legitimate websites, using features derived from TLS 1.2 and 1.3 traffic. The experimental results show that our proposed model achieved a classification accuracy of 99.61%, a precision of 99.80%, and a Matthews Correlation Coefficient of 99.22% on an in-house dataset. Our model excels at detecting phishing Uniform Resource Locator at the transport layer without data decryption. It is designed to block phishing attacks at the network gateway or firewall level. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results