Semantic Term Identification
In the third stage, semantic similar terms are computed for each topic term generated in previous stage. WordNet Java API is used to generate the list of semantic similar terms.WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
The Semantic term Mapper computes the semantic similar terms for each topic term generated by the document cluster and Semantic term reducer aggregate these terms and counts the frequencies of these terms (topic terms and semantic similar terms of topic terms) aggregately.
Then the terms are arranged in the descending order of frequency and top N topic terms (including the semantic similar terms) are selected. These filtered terms are called as semantic similar frequent terms available in the document collection