Spelling Variation of Roman Urdu Using Machine Learning

Authors

  • Mudasar Ahmed Soomro Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
  • Rafia Naz Memon Department of Software Engineering, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
  • Asghar Ali Chandio Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
  • Mehwish Leghari Department of Data Science, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
  • Muhammad Khalid Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.

Keywords:

Roman Urdu, Sentiment Analysis of Roman Urdu, Roman Urdu Spelling Variations, Machine Learning

Abstract

Spelling variations are common in languages without standardized orthography, such as Roman Urdu (RU), where no established criteria exist for spelling. For example, "2mro" is a nonstandard spelling for "tomorrow." In South Asia, Roman Urdu is widely used, especially on social media and in online product reviews, leading to a proliferation of user-generated spellings. This research compiles a dataset of Roman Urdu words (RUWs) with their spelling variations, collecting 5,244 distinct RUWs, each with one to five. To validate this dataset, in this study, we apply six machine learning (ML) classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Random Forest (RF). Among these, the SVM classifier performs better, achieving an accuracy of 99.96%, surpassing all other algorithms.

Downloads

Published

2024-09-01

How to Cite

Mudasar Ahmed Soomro, Rafia Naz Memon, Asghar Ali Chandio, Mehwish Leghari, & Muhammad Khalid. (2024). Spelling Variation of Roman Urdu Using Machine Learning. Journal of Computing & Biomedical Informatics, 7(02). Retrieved from https://jcbi.org/index.php/Main/article/view/529