Abstract:
Text summarization plays an important problem in natural language understanding and information retrieval. Automatic text summarization get much more attention by people presently because it is efficiently and effectively serve time in decision making process even for day to day life. Presently deep learning models get more attention than the traditional approaches. The primary objective of this research work is to propose a methodology to address the problem of summarization for Tamil sports news which can automatically create extractive summary for the news data with the use of Natural Language Processing (NLP) and a generic stochastic artificial neural network. Features such as sentence position, sentence position related to paragraph, number of named entities, term frequency and inverse document frequency and Number of numerals are employed to construct the feature matrix for each sentence and Restricted Boltzmann Machine is used to improve those features while enhancing the accuracy without loosing the main idea of the text. Experimentation is carried out using Online Tamil sports news and ROUGE tool kit is used to evaluate the recall, precision and F-measure for the summary generated by both the human experts and the system.
Citation:
T. Priyadharshan and S. Sumathipala, "Text Summarization for Tamil Online Sports News Using NLP," 2018 3rd International Conference on Information Technology Research (ICITR), 2018, pp. 1-5, doi: 10.1109/ICITR.2018.8736154.