Applicability of machine learning in spam and phishing email filtering: review and approaches

Gangavarapu, T.; Jaidhar, C.D.; Chanduka, B.

Applicability of machine learning in spam and phishing email filtering: review and approaches

dc.contributor.author	Gangavarapu, T.
dc.contributor.author	Jaidhar, C.D.
dc.contributor.author	Chanduka, B.
dc.date.accessioned	2026-02-05T09:28:11Z
dc.date.issued	2020
dc.description.abstract	With the influx of technological advancements and the increased simplicity in communication, especially through emails, the upsurge in the volume of unsolicited bulk emails (UBEs) has become a severe threat to global security and economy. Spam emails not only waste users’ time, but also consume a lot of network bandwidth, and may also include malware as executable files. Alternatively, phishing emails falsely claim users’ personal information to facilitate identity theft and are comparatively more dangerous. Thus, there is an intrinsic need for the development of more robust and dependable UBE filters that facilitate automatic detection of such emails. There are several countermeasures to spam and phishing, including blacklisting and content-based filtering. However, in addition to content-based features, behavior-based features are well-suited in the detection of UBEs. Machine learning models are being extensively used by leading internet service providers like Yahoo, Gmail, and Outlook, to filter and classify UBEs successfully. There are far too many options to consider, owing to the need to facilitate UBE detection and the recent advances in this domain. In this paper, we aim at elucidating on the way of extracting email content and behavior-based features, what features are appropriate in the detection of UBEs, and the selection of the most discriminating feature set. Furthermore, to accurately handle the menace of UBEs, we facilitate an exhaustive comparative study using several state-of-the-art machine learning algorithms. Our proposed models resulted in an overall accuracy of 99% in the classification of UBEs. The text is accompanied by snippets of Python code, to enable the reader to implement the approaches elucidated in this paper. © 2020, Springer Nature B.V.
dc.identifier.citation	Artificial Intelligence Review, 2020, 53, 7, pp. 5019-5081
dc.identifier.issn	2692821
dc.identifier.uri	https://doi.org/10.1007/s10462-020-09814-9
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/23712
dc.publisher	Springer Science+Business Media B.V. editorial@springerplus.com
dc.subject	Electronic mail
dc.subject	Feature extraction
dc.subject	High level languages
dc.subject	Information dissemination
dc.subject	Learning algorithms
dc.subject	Learning systems
dc.subject	Malware
dc.subject	Content based filtering
dc.subject	Content-based features
dc.subject	Feature engineerings
dc.subject	Machine learning models
dc.subject	Phishing
dc.subject	Python
dc.subject	Spam
dc.subject	Technological advancement
dc.subject	Machine learning
dc.title	Applicability of machine learning in spam and phishing email filtering: review and approaches

Collections

Journal Articles

Applicability of machine learning in spam and phishing email filtering: review and approaches

Files

Collections