TrackPhish: A Multi-Embedding Attention-Enhanced 1D CNN Model for Phishing URL Detection
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Phishing attacks are a growing threat to online security, with increasingly sophisticated and frequent tactics. This rise in cyber threats underscores the need for advanced detection methods. While the Internet is crucial for modern communication and commerce, it also exposes users to risks such as phishing, spamming, malware, and performance degradation attacks. Among these, malicious URLs, commonly embedded in static links within emails and websites, are a significant challenge in identifying and mitigating these attacks. This study proposes TrackPhish, a novel lightweight application that predicts URL legitimacy without visiting the associated website. The proposed model combines traditional word embeddings (Word2Vec, FastText, GloVe) with transformer models (BERT, RoBERTa, GPT-2) to create a comprehensive feature set fed into a Deep Learning (DL) model for detecting phishing URLs. The integration of these embeddings captures semantic relationships and contextual understanding of the text, generating a robust feature set enhanced by an attention mechanism to choose relevant features. The refined features are then used to train a One-Dimensional Convolutional Neural Network (1D CNN) model for phishing URL detection. The proposed model offers key advantages over existing methods, including independence from third-party features, adaptability for client-side deployment, and target-independent detection. Experimental results demonstrate the model’s effectiveness, achieving 95.41% accuracy with a low false positive rate of 1.44% on our dataset and an impressive 98.55% accuracy on benchmark datasets, outperforming existing baseline models. The proposed model represents a significant advancement over traditional methods, enhancing online security against phishing URLs. © 2005-2012 IEEE.
Description
Keywords
Deep learning, Economic and social effects, Embeddings, Feature extraction, Malware, Network security, Neural networks, Phishing, Websites, Attention meachnism, CNN models, Convolutional neural network, Neural network model, Phishing URL, Trackphish, Transformer modeling, Word embedding, Semantics
Citation
IEEE Transactions on Information Forensics and Security, 2025, 20, , pp. 12188-12198
