Webpage Classification for Search Engine Optimization using Machine Learning

Authors

  • Khurram Zeshan Haider Department of Software Engineering, Government College University, Faisalabad, Pakistan.
  • Rimsha Zafar Department of Software Engineering, Government College University, Faisalabad, Pakistan.
  • Qamas Gul Khan Safi Department of Computer Science, University of Engineering and Technology Taxila, 47050, Taxila, Pakistan.
  • Muhammad Awais Department of Software Engineering, Government College University, Faisalabad, Pakistan.
  • Muhammad Munwar Iqbal Department of Computer Science, University of Engineering and Technology Taxila, 47050, Taxila, Pakistan.

Keywords:

Malicious & Benign Websites, Machine Learning, Deep Neural Network, URL, SEO (Search Engine Optimization)

Abstract

Webpage classification for SEO is an essential area of study where machine learning, especially Deep Neural Networks (DNNs), plays a crucial role. This paper aims to develop an accurate Malicious & Benign page classifier using Deep Neural Networks (DNNs) for webpage classification in SEO. Data collection, selecting features, model construction, training, and evaluation, handling data that is imbalanced, & practical implementation considerations are just a few of the elements that make up the research approach. This dataset contains features like raw webpage content, geographical location, JavaScript length, obfuscated JavaScript code of the webpage, etc. The dataset has about 1.5 million web pages. 300,000 are used for testing, while 1.2 million are used for training. This dataset is highly skewed as 98.35% of the dataset are Benign webpages, and 2.27% are Malicious webpages, with a training dataset totaling 40,1806 instances, consisting of 25,770 good webpages, 6.41%, and 9472 harmful webpages, 2.35%. Our model is trained rigorously to identify patterns indicative of malicious intent. Our algorithm demonstrates robustness in classification in a test dataset of 398125 instances, including 23298 good webpages 5.8% and 9344 harmful webpages (2.34%). So, choosing the evaluation metrics carefully is essential, as just accuracy won’t give the correct evaluation, so I use an F1-score of 97.73%, a recall score of 95.2%, a precision score of 96%, and a confusion matrix. As a result, this paper solves the challenge of accurately differentiating between malicious and benign websites. The outcomes of this research contribute to webpage classification in SEO by leveraging DNNs to accurately classify malicious and benign webpages.

 

Downloads

Published

2025-05-30

How to Cite

Khurram Zeshan Haider, Rimsha Zafar, Qamas Gul Khan Safi, Muhammad Awais, & Muhammad Munwar Iqbal. (2025). Webpage Classification for Search Engine Optimization using Machine Learning. Journal of Computing & Biomedical Informatics. Retrieved from https://jcbi.org/index.php/Main/article/view/972

Issue

Section

Articles