Multi Document Text Summarization Using Map Reduce
Summarizing Large Text Collection of News Articles Using K-Means Clustering and Topic Modelling based on the MapReduce framework.
The proposed
technique is designed using semantic similarity based clustering and topic
modeling using Latent Dirichlet Allocation (LDA) for summarizing the large text
collection over MapReduce framework. The presented technique is evaluated in terms of scalability and
various text summarization parameters namely, compression ratio and
ROUGE score is used to measure the performance of the summaries.
ABSTRACT
Text summarization is one of the
important and challenging problems in text mining.
It provides a number of benefits to users and a number of fruitful real life applications
can be developed using text summarization.
Document summarization provides an instrument for faster understanding the collection
of text documents and has a number of real life applications. Semantic similarity and
clustering can be utilized efficiently for generating effective summary of large text
collections. Summarizing large volume of text is a challenging and time consuming
problem particularly while considering the semantic similarity computation in
summarization process. Summarization of text collection involves intensive text
processing and computations to generate the summary. A summarized document helps in
understanding the gist of the large text collections quickly and also save a lot of time
by avoiding reading of each individual document in a large text collection.