Cross-Lingual Information Retrieval in a Hybrid Query Model for Optimality

Authors

  • Abdul Basit Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.
  • Israr Hanif Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.
  • Muhammad Sajid Maqbool Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.
  • Wahid Qayyum Department of Software Engineering, Faculty of Computer Science, Lahore Garrison University, Lahore, 54000, Pakistan.
  • Muhammad Adnan Hasnain Department of Computer Science, National College of Business Administration & Economics, Lahore, Pakistan.
  • Rubaina Nazeer Department of Information Sciences, University of Education, Lahore, Pakistan.

Keywords:

Information Retrieval System, Urdu information retrieval, Cross-Lingual Information Retrieval, Roman-Urdu Information Retrieval

Abstract

Cross-Lingual Information Retrieval (CLIR) allows users to get the documents in the language other than the query language. It is accomplished in two ways: In first method the query is translated in target language while in second method the documents are translated in query’s language. Usually, the query translation is done due to translation complexity. In query translation method a query in language A is translated and compared against the document index in language B. Text RErieval Conference (TREC) is a forum to evaluate performance of an information retrieval system. Different tracks are designed to address different domains. Each track normally provides a corpus which contains collection of documents, few query topics and a set of related documents against each topic to perform the evaluation task.  Mono lingual information retrieval in Urdu-Urdu domain is addressed by the researchers up to some extent but cross lingual Urdu-English retrieval is not focused yet. Our research addresses this area by using UIR-21 corpus composed of Urdu news documents designed for Urdu information retrieval task. We used this corpus for modeling hybrid query impacts on retrieval. Proposed CLIR model supports query in three languages Urdu, English and Roman-Urdu and provides the documents in Mono-Lingual as well as Cross-Lingual (Urdu to English and vice versa) contexts. For evaluation purpose we computed the Precision, Recall and F-1 Score of each mode. The highest precision is achieved by the Roman Urdu Retrieval Model (RURM) and the lowest precision by the Urdu Retrieval Model (URM).

 

Author Biographies

Abdul Basit, Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.

 

 

Muhammad Sajid Maqbool, Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.

 

 

Muhammad Adnan Hasnain, Department of Computer Science, National College of Business Administration & Economics, Lahore, Pakistan.

 

 

 

Rubaina Nazeer, Department of Information Sciences, University of Education, Lahore, Pakistan.

 

 

Downloads

Published

2023-06-05

How to Cite

Abdul Basit, Israr Hanif, Muhammad Sajid Maqbool, Wahid Qayyum, Muhammad Adnan Hasnain, & Rubaina Nazeer. (2023). Cross-Lingual Information Retrieval in a Hybrid Query Model for Optimality. Journal of Computing & Biomedical Informatics, 5(01), 130–141. Retrieved from https://jcbi.org/index.php/Main/article/view/140