Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language

Authors

  • Bhumi Shah Computer Science and Engineering Department, Parul University, Vadodara, Gujarat, India.
  • Gaurav Kumar Ameta Computer Science and Engineering Department, Parul University, Vadodara, Gujarat, India.

Keywords:

Plagiarism Detection, Transformer Models, Gujarati Language, Semantic Similarity, Natural Language Processing

Abstract

This research proposed a trans-based NLP model of plagiarism detection in Gujarati, a low-resource and morphologically rich language. The difficulties with Gujarati include the lineage of annotated corpora, complicated morphology and a variety of syntactic structures. A hybrid solution that incorporates both statistical and contextual embeddings is created in order to resolve these problems. The similarity score of 0.4214 generated by baseline TF -IDF and cosine similarity approaches demonstrated that they do not have high ability to capture semantic relations. An optimized BERT model scored much higher at 0.9935, as it shows better contextual comprehension and paraphrase recognition. The self-attention mechanism of the transformer is appropriate in predicting long-range dependencies, which allows identifying paraphrased and obfuscated text. The results highlight the usefulness of transformer-based representations in the low-resource language setting and provide a practical approach to enhancing plagiarism detection.

Downloads

Published

2025-12-01

How to Cite

Bhumi Shah, & Gaurav Kumar Ameta. (2025). Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language. Journal of Computing & Biomedical Informatics. Retrieved from https://jcbi.org/index.php/Main/article/view/1176

Issue

Section

Articles