Study on Website Phishing and their Countermeasures
Date
2020
Authors
Rao, Routhu Srinivasa
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Phishing is one of the manipulation technique which targets naive online users tricking
into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as
a trustworthy or legitimate page to retrieve personal information. There are many antiphishing solutions such as blacklist or whitelist, heuristic and visual similarity based
methods proposed till date to prevent the phishing attacks. But online users are still getting trapped into revealing sensitive information in phishing websites. In this research
work, we focus on designing new heuristic techniques with comprehensive feature set
and different machine learning algorithms for the classification of phishing sites.
There exists many machine learning (ML) based techniques to detect the phishing
sites but they do not achieve better detection accuracy. To overcome the disadvantages
of existing schemes, we have presented an efficient feature-based machine learning
framework for the detection of phishing sites. The feature set is collected from different
resources such as URL, source code and third party services and fed to the machine
learning classifier. The model achieved a significant accuracy of 99.55% using orthogonal Random Forest classifier with a True Positive Rate (TPR) of 99.45% and True
Negative Rate (TNR) of 99.42%.
Although ML-based technique achieved a significant accuracy but due to the use of
third-party services such as search engine or page ranking services the technique might
fail when phishing sites hosted on compromised servers (PSHCS) are encountered. To
counter these PSHCS, we presented two techniques with and without third party services. Firstly, we present a novel heuristic technique using twin support vector machine
(TWSVM) to detect malicious registered phishing sites and also sites which are hosted
on compromised servers. This technique achieved an accuracy of 98.4% in detecting
phishing sites with TPR of 98.72% and TNR of 98.08%. This technique relies on the
home page of the suspicious site for calculating the similarity score between the home
page and suspicious site. This mechanism might fail when the correct home page of thesuspicious site is not retrieved. Hence, we presented an improved search engine based
technique to identify the matched page for the suspicious site with a dynamic search
query to calculate the similarity score. This technique not only detects PSHCS but also
detects the newly registered legitimate site. The technique achieved an accuracy of
98.61% with TPR of 97.77% and TNR of 99.36%.
The above presented techniques rely on the source code of the website and third
party services which needs loading the page for detecting the status of the website. Due
to this, the response time of the detection process might get delayed at the client-side.
Moreover, due to guaranteed visit of webpage, there might be a more chance of accidental download of malware from the webpage (drive-by-downloads). Hence, we proposed two lightweight techniques based on the inspection of URLs. These techniques
are designed to use as first-level filtering of phishing websites without even visiting
the suspicious site. The first technique is deployed as a web application which uses
hand-crafted and Term-Frequency Inverse Document Frequency features for the detection. The technique achieved an accuracy of 94.26% with TPR of 93.31% and TNR of
96.65%. The second technique is designed for the mobile device where a multi-model
ensemble of Long Short Term Memory and Support Vector Machine is presented for
the phishing detection. This technique achieved an accuracy of 97.30% with TPR of
97.31% and TNR of 97.28%.
The earlier presented techniques either used content or URLs for the phishing detection but they lack the information of target website of the designed phishing site.
To offer the same, we presented a lightweight visual similarity-based approach which
maintains fingerprints of blacklisted phishing sites along with their target legitimate
domains. Also, the technique includes heuristic features for the detection of phishing
sites targeting non-whitelisted legitimate sites. This technique achieves a significant
accuracy of 98.72% with TPR of 98.51% and TNR of 98.87%.
Description
Keywords
Department of Computer Science & Engineering