Phishing Email and URL Detection using Machine learning and Deep learning
Date
2023
Authors
M, Somesha
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute Of Technology Karnataka Surathkal
Abstract
The research thesis attempts to address the issue of email phishing, which poses a se-
rious risk to businesses and corporations. Through the use of social engineering strate-
gies, email phishing assaults persuade users to divulge personal data that can be ex-
ploited to access their digital assets. Despite the presence several defenses, the Anti-
Phishing Working Group survey reveals that the present approaches to phishing attack
detection are still insufficient and ineffective. This underlines the requirement for a
more effective system to identify phishing emails and offer greater protection against
such assaults to the end user.
There exist many machine learning based techniques to detect phishing emails.
Also, they use a large number of heuristics to classify the email. To overcome the dis-
advantages of existing schemes, we have presented an efficient word embedding cum
machine learning framework to classify the emails. The presented technique uses only
four email header based heuristics (i.e. From, Return-path, Subject, and Message-ID).
The model achieved a significant accuracy of 99.50% using FastText-CBOW algorithm
in combination with the Random Forest classifier.
Although machine learning based techniques achieved significant accuracy, it is ad-
visable to use deep learning models whenever we have sufficient data. We have pre-
sented an efficient deep learning model called ”DeepEPhishNet” for the classification of
emails. The presented model based on FastText-SkipGram with Deep Neural Network
(DNN) achieved a significant accuracy of 99.52%, TPR of 99.38%, TNR of 99.92%,
F-Score of 99.68%, Precision of 99.97%, and MCC of 98.71%.
The above methods make use of only four email header based heuristics for the
classification. To study the contribution of the email body in the detection of phishing
emails, we have presented an efficient model using transformers. The presented model
achieved an accuracy of 99.51% using open source datasets.
The body of the email might contain phishing URLs, which may lead to a phishing
attack. In order to overcome this, we have presented an efficient deep learning basedmodel for phishing URL detection. The accuracy achieved for the DNN, LSTM, and
CNN are 99.52%, 99.57%, and 99.43% respectively.
Overall, this research thesis presents efficient techniques for detecting phishing
emails and URLs using word embedding, deep learning, and machine learning clas-
sifiers.