Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach

Rao, R.S.; Pais, A.R.

Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/13655

Title:	Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach
Authors:	Rao, R.S. Pais, A.R.
Issue Date:	2019
Citation:	Journal of Ambient Intelligence and Humanized Computing, 2019, Vol., , pp.-
Abstract:	The visual similarity-based techniques detect the phishing sites based on the similarity between the suspicious site and the existing database of resources such as screenshots, styles, logos, favicons etc. These techniques fail to detect phishing sites which target non-whitelisted legitimate domain or when phishing site with manipulated whitelisted legitimate content is encountered. Also, these techniques are not well adaptable at the client-side due to their computation and space complexity. Thus there is a need for light weight visual similarity-based technique detecting phishing sites targeting non-whitelisted legitimate resources. Unlike traditional visual similarity-based techniques using whitelists, in this paper, we employed a light-weight visual similarity based blacklist approach as a first level filter for the detection of near duplicate phishing sites. For the non-blacklisted phishing sites, we have incorporated a heuristic mechanism as a second level filter. We used two fuzzy similarity measures, Simhash and Perceptual hash for calculating the similarity score between the suspicious site and existing blacklisted phishing sites. Each similarity measure generates a unique fingerprint for a given website and also differs with less number of bits with a similar website. All three fingerprints together represent a website which undergoes blacklist filtering for the identification of the target website. The phishing sites which bypassed from the first level filter undergo second level heuristic filtering. We used comprehensive heuristic features including URL and source code based features for the detection of non-blacklisted phishing sites. The experimental results demonstrate that the blacklist filter alone is able to detect 55.58% of phishing sites which are either replicas or near duplicates of existing phishing sites. We also proposed an ensemble model with Random Forest (RF), Extra-Tree and XGBoost to evaluate the contribution of both blacklist and heuristic filters together as an entity and the model achieved a significant accuracy of 98.72% and Matthews Correlation Coefficient (MCC) of 97.39%. The proposed model is deployed as a chrome extension named as BlackPhish to provide real time protection against phishing sites at the client side. We also compared BlackPhish with the existing anti-phishing techniques where it outperformed existing works with a significant difference in accuracy and MCC. � 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
URI:	10.1007/s12652-019-01637-z http://idr.nitk.ac.in/jspui/handle/123456789/13655
Appears in Collections:	1. Journal Articles

Files in This Item:

There are no files associated with this item.

Show full item record