CatchPhish: detection of phishing websites by inspecting URLs

dc.contributor.authorRao, R.S.
dc.contributor.authorVaishnavi, T.
dc.contributor.authorPais, A.R.
dc.date.accessioned2026-02-05T09:29:04Z
dc.date.issued2020
dc.description.abstractThere exists many anti-phishing techniques which use source code-based features and third party services to detect the phishing sites. These techniques have some limitations and one of them is that they fail to handle drive-by-downloads. They also use third-party services for the detection of phishing URLs which delay the classification process. Hence, in this paper, we propose a light-weight application, CatchPhish which predicts the URL legitimacy without visiting the website. The proposed technique uses hostname, full URL, Term Frequency-Inverse Document Frequency (TF-IDF) features and phish-hinted words from the suspicious URL for the classification using the Random forest classifier. The proposed model with only TF-IDF features on our dataset achieved an accuracy of 93.25%. Experiment with TF-IDF and hand-crafted features achieved a significant accuracy of 94.26% on our dataset and an accuracy of 98.25%, 97.49% on benchmark datasets which is much better than the existing baseline models. © 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
dc.identifier.citationJournal of Ambient Intelligence and Humanized Computing, 2020, 11, 2, pp. 813-825
dc.identifier.issn18685137
dc.identifier.urihttps://doi.org/10.1007/s12652-019-01311-4
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/24087
dc.publisherSpringer
dc.subjectDecision trees
dc.subjectInformation retrieval systems
dc.subjectText processing
dc.subjectWebsites
dc.subjectAnti-phishing
dc.subjectHostname
dc.subjectPhishing
dc.subjectRandom forests
dc.subjectTF-IDF
dc.subjectComputer crime
dc.titleCatchPhish: detection of phishing websites by inspecting URLs

Files

Collections