Spelling Variation of Roman Urdu Using Machine Learning

Mudasar Ahmed Soomro; Rafia Naz Memon; Asghar Ali Chandio; Mehwish Leghari; Muhammad Khalid

Authors

Mudasar Ahmed Soomro Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
Rafia Naz Memon Department of Software Engineering, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
Asghar Ali Chandio Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
Mehwish Leghari Department of Data Science, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.
Muhammad Khalid Department of Information Technology, Quaid-e-Awam University of Engineering, Science & Technology, Nawabshah, 67480, Pakistan.

Keywords:

Roman Urdu, Sentiment Analysis of Roman Urdu, Roman Urdu Spelling Variations, Machine Learning

Abstract

Spelling variations are common in languages without standardized orthography, such as Roman Urdu (RU), where no established criteria exist for spelling. For example, "2mro" is a nonstandard spelling for "tomorrow." In South Asia, Roman Urdu is widely used, especially on social media and in online product reviews, leading to a proliferation of user-generated spellings. This research compiles a dataset of Roman Urdu words (RUWs) with their spelling variations, collecting 5,244 distinct RUWs, each with one to five. To validate this dataset, in this study, we apply six machine learning (ML) classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Random Forest (RF). Among these, the SVM classifier performs better, achieving an accuracy of 99.96%, surpassing all other algorithms.

Spelling Variation of Roman Urdu Using Machine Learning

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

SCOPUS Q3

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2