Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques

Authors

  • Muhammad Akeel Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.
  • Khushbu Khalid Butt Department of Information Technology, Lahore Garrison University, Lahore, Pakistan.
  • Khadija Javed Lahore Business School, University of Lahore, Lahore, Pakistan.
  • Maria Tariq Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.
  • Muhammad Yousaf Department of Computer Science, Lahore Garrison University, Lahore, Pakistan.

Keywords:

Spam Detection, Machine Learning, TF-IDF, Support Vector Machine, Email Classification, Ensemble Learning

Abstract

Spam emails remain a major challenge for digital communications today, with far-reaching implications in terms of productivity losses, storage consumption, and presenting severe cybersecurity threats such as phishing, malware, identity theft, etc. The traditional mechanisms based on rules and keyword matching have completely failed to combat the countless concealed forms of spam content obfuscation, dynamic generation, and URL cloaking. In contrast, the present study reports on a machine-learning-based approach for spam detection using NLP for preprocessing and TF-IDF for feature extraction. Multiple supervised classifiers were built and evaluated, namely Logistic Regression, Naïve Bayes, Random Forest, Gradient Boosting, Support Vector Machines (SVM), and Ensemble Learning, using the publicly available mail_data.csv data set for training and evaluation. An 80:20 split for training and testing was employed, and the models were evaluated based on accuracy, precision, recall, and F1 score. Among them, SVM attained the utmost accuracy (98.9%), indicating its skillfulness in segregating spam from legitimate emails.

Downloads

Published

2025-12-01

How to Cite

Muhammad Akeel, Khushbu Khalid Butt, Khadija Javed, Maria Tariq, & Muhammad Yousaf. (2025). Email Spam Detection Using Machine Learning with Optimized Feature Engineering and Classification Techniques. Journal of Computing & Biomedical Informatics, 10(01). Retrieved from https://jcbi.org/index.php/Main/article/view/1130