CatchPhish: detection of phishing websites by inspecting URLs
No Thumbnail Available
Date
2020
Authors
Rao, R.S.
Vaishnavi, T.
Pais, A.R.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
There exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models. 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
Description
Keywords
Citation
Journal of Ambient Intelligence and Humanized Computing, 2020, Vol.11, 2, pp.813-825