Abstractive Text Summarization for Urdu Language
Keywords:
Natural Language Processing, Term Frequency, Inverse Document Frequency, Bidirectional Encoder Representations from TransformersAbstract
The quantity of textual data is increasing in online realm with the blink of eye and it has become very difficult to extract useful information from such enormous bundle of information. An area of natural language processing known as automated text summarization is responsible for producing gist’s, abstracts, and summaries of written text in a variety of human languages. These are of a high quality and contain relevant information. Extractive and abstractive summarization are the two methods that can be used to summarize information. A lot of research is being conducted especially in extractive summary. In Urdu language, there is no research efforts in abstractive summarization up till now, so it is much needed of having research works to be done in this domain. Urdu language is mainly spoken in South Asia. In the proposed research work we use amalgam of extractive and abstractive algorithms to generate summaries. Sentence weight, TF-IDF, word frequency algorithms are used for extractive summaries. A hybrid technique is utilized so that the findings of extractive summaries can be improved. The abstractive summaries will be produced once the summaries provided by the hybrid approach have been processed using the BERT model. In order to analyses the summaries that were automatically created by the system, we have enlisted the assistance of certain Urdu language experts.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License