Benchmarking semantic, centroid, and graph-based approaches for multi-document summarization
No Thumbnail Available
Date
2021
Authors
Agrawal A.
George R.A.
Ravi S.S.
Kamath S.S.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Multi-document summarization (MDS) is a pre-programmed process to excerpt data from various documents regarding similar topics. We aim to employ three techniques for generating summaries from various document collections on the same topic. The first approach is to calculate the importance score for each sentence using features including TF-IDF matrix as well as semantic and syntax similarity. We build our algorithm to sort the sentences by importance and add it to the summary. In the second approach, we use the k-means clustering algorithm for generating the summary. The third approach makes use of the Page Ranking algorithm wherein edges of the graph are formed between sentences that are syntactically similar but are not semantically similar. All these techniques have been used to generate 100–200 word summaries for the DUC 2004 dataset. We use ROUGE scores to evaluate the system-generated summaries with respect to the manually generated summaries. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2021.
Description
Keywords
Citation
Advances in Intelligent Systems and Computing , Vol. 1177 , , p. 255 - 263