Cross-Lingual Information Retrieval in a Hybrid Query Model for Optimality
Keywords:
Information Retrieval System, Urdu information retrieval, Cross-Lingual Information Retrieval, Roman-Urdu Information RetrievalAbstract
Cross-Lingual Information Retrieval (CLIR) allows users to get the documents in the language other than the query language. It is accomplished in two ways: In first method the query is translated in target language while in second method the documents are translated in query’s language. Usually, the query translation is done due to translation complexity. In query translation method a query in language A is translated and compared against the document index in language B. Text RErieval Conference (TREC) is a forum to evaluate performance of an information retrieval system. Different tracks are designed to address different domains. Each track normally provides a corpus which contains collection of documents, few query topics and a set of related documents against each topic to perform the evaluation task. Mono lingual information retrieval in Urdu-Urdu domain is addressed by the researchers up to some extent but cross lingual Urdu-English retrieval is not focused yet. Our research addresses this area by using UIR-21 corpus composed of Urdu news documents designed for Urdu information retrieval task. We used this corpus for modeling hybrid query impacts on retrieval. Proposed CLIR model supports query in three languages Urdu, English and Roman-Urdu and provides the documents in Mono-Lingual as well as Cross-Lingual (Urdu to English and vice versa) contexts. For evaluation purpose we computed the Precision, Recall and F-1 Score of each mode. The highest precision is achieved by the Roman Urdu Retrieval Model (RURM) and the lowest precision by the Urdu Retrieval Model (URM).
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License