Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques

Muhammad Akeel; Khushbu Khalid Butt; Khadija Javed; Maria Tariq; Muhammad Yousaf

Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques

Authors

Muhammad Akeel Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.
Khushbu Khalid Butt Department of Information Technology, Lahore Garrison University, Lahore, Pakistan.
Khadija Javed Lahore Business School, University of Lahore, Lahore, Pakistan.
Maria Tariq Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.
Muhammad Yousaf Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.

Keywords:

Spam Detection, Machine Learning, TF-IDF, Support Vector Machine, Email Classification, Ensemble Learning

Abstract

Spam emails remain a major challenge for digital communications today, with far-reaching implications in terms of productivity losses, storage consumption, and presenting severe cybersecurity threats such as phishing, malware, identity theft, etc. The traditional mechanisms based on rules and keyword matching have completely failed to combat the countless concealed forms of spam content obfuscation, dynamic generation, and URL cloaking. In contrast, the present study reports on a machine-learning-based approach for spam detection using NLP for preprocessing and TF-IDF for feature extraction. Multiple supervised classifiers were built and evaluated, namely Logistic Regression, Naïve Bayes, Random Forest, Gradient Boosting, Support Vector Machines (SVM), and Ensemble Learning, using the publicly available mail_data.csv data set for training and evaluation. An 80:20 split for training and testing was employed, and the models were evaluated based on accuracy, precision, recall, and F1 score. Among them, SVM attained the utmost accuracy (98.9%), indicating its skillfulness in segregating spam from legitimate emails.

Downloads

Published

2025-12-01

How to Cite

Muhammad Akeel, Khushbu Khalid Butt, Khadija Javed, Maria Tariq, & Muhammad Yousaf. (2025). Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques. Journal of Computing & Biomedical Informatics, 10(01). Retrieved from https://jcbi.org/index.php/Main/article/view/1130

Download Citation

Issue

Vol. 10 No. 01 (2025): Journal of Computing & Biomedical Informatics

Section

Articles

License

This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License

Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2