Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language

Bhumi Shah; Gaurav Kumar Ameta

doi:10.56979/1001/2025/1176

Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language

Authors

Bhumi Shah Computer Science and Engineering Department, Parul University, Vadodara, Gujarat, India.
Gaurav Kumar Ameta Computer Science and Engineering Department, Parul University, Vadodara, Gujarat, India.

DOI:

https://doi.org/10.56979/1001/2025/1176

Keywords:

Plagiarism Detection, Transformer Models, Gujarati Language, Semantic Similarity, Natural Language Processing

Abstract

This research proposed a trans-based NLP model of plagiarism detection in Gujarati, a low-resource and morphologically rich language. The difficulties with Gujarati include the lineage of annotated corpora, complicated morphology and a variety of syntactic structures. A hybrid solution that incorporates both statistical and contextual embeddings is created in order to resolve these problems. The similarity score of 0.4214 generated by baseline TF -IDF and cosine similarity approaches demonstrated that they do not have high ability to capture semantic relations. An optimized BERT model scored much higher at 0.9935, as it shows better contextual comprehension and paraphrase recognition. The self-attention mechanism of the transformer is appropriate in predicting long-range dependencies, which allows identifying paraphrased and obfuscated text. The results highlight the usefulness of transformer-based representations in the low-resource language setting and provide a practical approach to enhancing plagiarism detection.

Downloads

Published

2025-12-01

How to Cite

Bhumi Shah, & Gaurav Kumar Ameta. (2025). Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language. Journal of Computing & Biomedical Informatics, 10(01). https://doi.org/10.56979/1001/2025/1176

Download Citation

Issue

Accepted Articles

Section

Articles

License

This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License

Transformer-Based Semantic Similarity Framework for Extrinsic Plagiarism Detection in Low-Resource Gujarati Language

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

SCOPUS

HJRS

ISSN

Online First

Call for Papers

Make a Submission

Open Access

Information

Conference

SC-2