Application of word embedding and machine learning in detecting phishing websites

Rao, R.S.; Umarekar, A.; Pais, A.R.

Application of word embedding and machine learning in detecting phishing websites

Date

2022

Authors

Rao, R.S.

Umarekar, A.

Pais, A.R.

Publisher

Springer

Abstract

Phishing is an attack whose aim is to gain personal information such as passwords, credit card details etc. from online users by deceiving them through fake websites, emails or any legitimate internet service. There exists many techniques to detect phishing sites such as third-party based techniques, source code based methods and URL based methods but still users are getting trapped into revealing their sensitive information. In this paper, we propose a new technique which detects phishing sites with word embeddings using plain text and domain specific text extracted from the source code. We applied various word embedding for the evaluation of our model using ensemble and multimodal approaches. From the experimental evaluation, we observed that multimodal with domain specific text achieved a significant accuracy of 99.34% with TPR of 99.59%, FPR of 0.93%, and MCC of 98.68% © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Keywords

Codes (symbols), Computer crime, Embeddings, Fake detection, Machine learning, Websites, Anti-phishing, Domain specific, Hostname, Phishing, Phishing websites, Random forests, Source codes, TF-IDF, URL, Decision trees

Citation

Telecommunication Systems, 2022, 79, 1, pp. 33-45

URI

https://doi.org/10.1007/s11235-021-00850-6
https://idr.nitk.ac.in/handle/123456789/22857

Collections

Journal Articles

Full item page

Application of word embedding and machine learning in detecting phishing websites

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By