Multi Document Text Summarization Using Map Reduce

The proposed technique is designed using semantic similarity based clustering and topic modeling using Latent Dirichlet Allocation (LDA) for summarizing the large text collection over MapReduce framework. The presented technique is evaluated in terms of scalability and various text summarization parameters namely, compression ratio and ROUGE score is used to measure the performance of the summaries.

ABSTRACT

Text summarization is one of the important and challenging problems in text mining. It provides a number of benefits to users and a number of fruitful real life applications can be developed using text summarization. Document summarization provides an instrument for faster understanding the collection of text documents and has a number of real life applications. Semantic similarity and clustering can be utilized efficiently for generating effective summary of large text collections. Summarizing large volume of text is a challenging and time consuming problem particularly while considering the semantic similarity computation in summarization process. Summarization of text collection involves intensive text processing and computations to generate the summary. A summarized document helps in understanding the gist of the large text collections quickly and also save a lot of time by avoiding reading of each individual document in a large text collection.