Spelling Variation of Roman Urdu Using Machine Learning
Keywords:
Roman Urdu, Sentiment Analysis of Roman Urdu, Roman Urdu Spelling Variations, Machine LearningAbstract
Spelling variations are common in languages without standardized orthography, such as Roman Urdu (RU), where no established criteria exist for spelling. For example, "2mro" is a nonstandard spelling for "tomorrow." In South Asia, Roman Urdu is widely used, especially on social media and in online product reviews, leading to a proliferation of user-generated spellings. This research compiles a dataset of Roman Urdu words (RUWs) with their spelling variations, collecting 5,244 distinct RUWs, each with one to five. To validate this dataset, in this study, we apply six machine learning (ML) classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Random Forest (RF). Among these, the SVM classifier performs better, achieving an accuracy of 99.96%, surpassing all other algorithms.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License