Show simple item record

dc.contributor.author Tennakoon, S
dc.contributor.editor Thayasivam, U
dc.contributor.editor Rathnayaka, C
dc.date.accessioned 2025-01-24T08:25:31Z
dc.date.available 2025-01-24T08:25:31Z
dc.date.issued 2020
dc.identifier.citation Tennakoon, S., (2020). Automatic Sinhala news text summarizer. In U. Thayasivam., & C. Rathnayaka, (Ed.), Symposium on Natural Language Processing 2020: Proceedings of Symposium on Natural Language Processing 2020 (p. 11). National Language Processing Centre University of Moratuwa. http://dl.lib.uom.lk/handle/123/23264
dc.identifier.uri http://dl.lib.uom.lk/handle/123/23264
dc.description.abstract With the present explosion of news circulating the digital space, which consists mostly of unstructured textual data, there is a need to absorb the content of news easily and effectively. While there are many Sinhala news sites out there, no site facilitates recommendation despite the popularity of recommender systems in the current age and day. Therefore, it is effective if the news were presented in a summarized version which tally with the user preferences as well. Our research aims to fill this gap by providing a centralized news platform which recommends news to its users clearly and concisely. The news articles were collected using web scraping and after performing categorization it will be presented in a summarized context. Also, we expect to detect the grey sheep users and to provide separate recommendations to them in order to minimize errors in recommendation. Here the grey sheep users refer to the user group who have special tastes and they may neither agree nor disagree with majority of users. By implementing the proposed system, we hope to provide appropriate solutions to the mentioned requirements and build a user-friendly Sinhala news platform. Considering about the application, manually creating a summary can be time consuming and tedious. The main idea behind building an automatic text summarization is to distinguish the highest significant information from the given content, decrease of the offered text to fewer sentences without leaving the fundamental thoughts of the first content and present it to the end-readers. Implementation of a specific summarizer for Sinhala Language is a major requirement to develop such an application because there is no Sinhala news platform available which presents summarized and categorized news text to the users. The objective is to produce a brief and exact outline of voluminous news messages while focusing on the key thoughts that convey beneficial information without losing the general significance. The research aims to build the summarizer with the use of PyTeaser algorithm. Even though PyTeaser can not be directly used for Sinhala Language, by using language specific modifications, PyTeaser is made available for Sinhala. The logic behind the PyTeaser includes assigning a total score to each sentence based on four features: Title Score, Keyword Frequency, Sentence Length and Sentence Position. Total score is computed by weighting the mentioned features and those weights are constants. Then the sentences with the highest scores are selected to produce the summary. The quality of the summary is evaluated using F Measure, with the use of human-generated summaries which are produced by Sinhala experts. The research focuses to compute the F Measure for all the possible weight combinations made out with original weights of PyTeaser and choose the optimized weight vector which provides the best quality news summary. The summaries for the proposed application are generated using the derived weight set and then those news are expected to recommend to endusers via the recommender system, according to user preferences. en_US
dc.language.iso en en_US
dc.publisher National Language Processing Centre University of Moratuwa Sri Lanka en_US
dc.subject automatic text summarization en_US
dc.subject PyTeaser algorithm en_US
dc.title Automatic Sinhala news text summarizer en_US
dc.type Conference-Abstract en_US
dc.identifier.year 2020 en_US
dc.identifier.conference Symposium on Natural Language Processing 2020 en_US
dc.identifier.place University of Moratuwa en_US
dc.identifier.pgnos p. 11 en_US
dc.identifier.proceeding Proceedings of Symposium on Natural Language Processing 2020 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record