A Feature-Level Hybrid Model Approach for Automated Phishing Email Detection
Keywords:
Cybersecurity, Deep Learning, Phishing Detection, Spam Emails, Cyber Attacks, Hybrid ModelAbstract
Phishing emails continue to pose a serious cybersecurity risk that entails tricking the recipients and getting them to disclose personal and financial information, including their login details. The timely malware attack detection helps to reduce the impact and guarantee the safety of users. The literature has over time utilized different methods in phishing detection, such as rule based, classical machine learning and deep learning methods, but most of them have just used the textual content of the email and have ignored other features such as the URLs, which can also provide important contexts. In order to fill this gap, we present a complex framework aimed at the detection of phishing emails, based on the joint usage of textual and structural information. Based on an experimental natural language collection of email bodies and the presence or absence of URLs as labels, the experiments were designed to use the TF-IDF characteristics to ensure that classical machine learning models, such as Logistic Regression, Support Vector Machine, and Random Forest, offered excellent baselines. Then, we presented a feature-level hybrid deep learning model, which is built upon DistilBERT, called Hybrid DistilBERT, where semantic representations of email messages text along with URL presence are introduced as inputs. This hybrid model has been proven practical in its nature to merge the context of transformer models with meta-level data and has provided substantial advancement in predictive performance. Experimental results showed that our proposed model has the best classification accuracy, reaching 99.1% as compared to traditional models, and this is testimony to the applicability of our hybrid design. These findings support the worth of combining both the content and structural clues to make the phishing detection more resilient and precise.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License