Comparative Evaluation of Text Similarity Matrices for Enhanced Abstractive Summarization on CNN/Dailymail Corpus

Authors

  • Muhammad Hammad Asif Depatmetof Creative Technologies, Air University, Islamabad, Pakistan.
  • Aman Ullah Yaseen Depatmetof Creative Technologies, Air University, Islamabad, Pakistan.

Keywords:

Abstractive Summary, Similarity Measure, Universal Sentence Encoder, Text summarization

Abstract

The importance of extracting essential information from extensive textual data is becoming increasingly crucial in our contemporary data-abundant environment. Using the CNN/Dailymail dataset, this study rigorously investigates the effect of text similarity matrices—including Cosine Similarity, Jaccard Index, and the Universal Sentence Encoder (USE)—on abstractive text summarization. It assesses how these matrices influence the quality of summaries. The research utilises comprehensive assessment techniques, such as BLEU scores, to examine the performance of each matrix. This delineates the effectiveness and limitations of employing quantitative analysis and graphical representations to succinctly and coherently summarize extensive textual content. While the Cosine Similarity metric exhibits notable efficacy in ensuring both coherence and informativeness, it marginally trails the Jaccard Index in these aspects. Despite its competitive standing, the Universal Sentence Encoder displays intricate attributes that significantly impact the quality of generated summaries. Moreover, the research delves into the potential implications for the realm of natural language processing (NLP) and offers pragmatic recommendations for practitioners, technology developers, and academics. The empirical findings of this study offer substantial insights into the pivotal role text similarity matrices play in condensing abstractive texts, thereby enriching comprehension.

Downloads

Published

2023-12-05

How to Cite

Muhammad Hammad Asif, & Aman Ullah Yaseen. (2023). Comparative Evaluation of Text Similarity Matrices for Enhanced Abstractive Summarization on CNN/Dailymail Corpus. Journal of Computing & Biomedical Informatics, 6(01), 208–215. Retrieved from https://jcbi.org/index.php/Main/article/view/270