Comparative Evaluation of Text Similarity Matrices for Enhanced Abstractive Summarization on CNN/Dailymail Corpus
Keywords:
Abstractive Summary, Similarity Measure, Universal Sentence Encoder, Text summarizationAbstract
The importance of extracting essential information from extensive textual data is becoming increasingly crucial in our contemporary data-abundant environment. Using the CNN/Dailymail dataset, this study rigorously investigates the effect of text similarity matrices—including Cosine Similarity, Jaccard Index, and the Universal Sentence Encoder (USE)—on abstractive text summarization. It assesses how these matrices influence the quality of summaries. The research utilises comprehensive assessment techniques, such as BLEU scores, to examine the performance of each matrix. This delineates the effectiveness and limitations of employing quantitative analysis and graphical representations to succinctly and coherently summarize extensive textual content. While the Cosine Similarity metric exhibits notable efficacy in ensuring both coherence and informativeness, it marginally trails the Jaccard Index in these aspects. Despite its competitive standing, the Universal Sentence Encoder displays intricate attributes that significantly impact the quality of generated summaries. Moreover, the research delves into the potential implications for the realm of natural language processing (NLP) and offers pragmatic recommendations for practitioners, technology developers, and academics. The empirical findings of this study offer substantial insights into the pivotal role text similarity matrices play in condensing abstractive texts, thereby enriching comprehension.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License