Abstractive Text Summarization for Urdu Language

Authors

  • Asif Raza Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan.
  • Muhammad Hanif Soomro Department of Information Technology, University of Mirpurkhas, Sindh, Pakistan.
  • Salahuddin Department of Computer Science, NFC Institute of Engineering and technology, Multan, Pakistan.
  • Inzamam Shahzad School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan, Hunan, China.
  • Saima Batool Institute of Computing, Muhammad Nawaz Sharif University of Agriculture, Multan, Pakistan.

Keywords:

Natural Language Processing, Term Frequency, Inverse Document Frequency, Bidirectional Encoder Representations from Transformers

Abstract

The quantity of textual data is increasing in online realm with the blink of eye and it has become very difficult to extract useful information from such enormous bundle of information. An area of natural language processing known as automated text summarization is responsible for producing gist’s, abstracts, and summaries of written text in a variety of human languages. These are of a high quality and contain relevant information. Extractive and abstractive summarization are the two methods that can be used to summarize information. A lot of research is being conducted especially in extractive summary. In Urdu language, there is no research efforts in abstractive summarization up till now, so it is much needed of having research works to be done in this domain. Urdu language is mainly spoken in South Asia. In the proposed research work we use amalgam of extractive and abstractive algorithms to generate summaries. Sentence weight, TF-IDF, word frequency algorithms are used for extractive summaries. A hybrid technique is utilized so that the findings of extractive summaries can be improved. The abstractive summaries will be produced once the summaries provided by the hybrid approach have been processed using the BERT model. In order to analyses the summaries that were automatically created by the system, we have enlisted the assistance of certain Urdu language experts.

Downloads

Published

2024-09-01

How to Cite

Asif Raza, Muhammad Hanif Soomro, Salahuddin, Inzamam Shahzad, & Saima Batool. (2024). Abstractive Text Summarization for Urdu Language. Journal of Computing & Biomedical Informatics, 7(02). Retrieved from https://jcbi.org/index.php/Main/article/view/596