Automatic Sinhala news text summarizer

Tennakoon, S

UoM IR
→
Research Publications
→
Conference Proceedings
→
UoM Conferences
→
Departmental Conferences
→
Symposium on Natural Language Processing
→
SNLP - 2020
→
View Item

dc.contributor.author	Tennakoon, S
dc.contributor.editor	Thayasivam, U
dc.contributor.editor	Rathnayaka, C
dc.date.accessioned	2025-01-24T08:25:31Z
dc.date.available	2025-01-24T08:25:31Z
dc.date.issued	2020
dc.identifier.citation	Tennakoon, S., (2020). Automatic Sinhala news text summarizer. In U. Thayasivam., & C. Rathnayaka, (Ed.), Symposium on Natural Language Processing 2020: Proceedings of Symposium on Natural Language Processing 2020 (p. 11). National Language Processing Centre University of Moratuwa. http://dl.lib.uom.lk/handle/123/23264
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/23264
dc.description.abstract	With the present explosion of news circulating the digital space, which consists mostly of unstructured textual data, there is a need to absorb the content of news easily and effectively. While there are many Sinhala news sites out there, no site facilitates recommendation despite the popularity of recommender systems in the current age and day. Therefore, it is effective if the news were presented in a summarized version which tally with the user preferences as well. Our research aims to fill this gap by providing a centralized news platform which recommends news to its users clearly and concisely. The news articles were collected using web scraping and after performing categorization it will be presented in a summarized context. Also, we expect to detect the grey sheep users and to provide separate recommendations to them in order to minimize errors in recommendation. Here the grey sheep users refer to the user group who have special tastes and they may neither agree nor disagree with majority of users. By implementing the proposed system, we hope to provide appropriate solutions to the mentioned requirements and build a user-friendly Sinhala news platform. Considering about the application, manually creating a summary can be time consuming and tedious. The main idea behind building an automatic text summarization is to distinguish the highest significant information from the given content, decrease of the offered text to fewer sentences without leaving the fundamental thoughts of the first content and present it to the end-readers. Implementation of a specific summarizer for Sinhala Language is a major requirement to develop such an application because there is no Sinhala news platform available which presents summarized and categorized news text to the users. The objective is to produce a brief and exact outline of voluminous news messages while focusing on the key thoughts that convey beneficial information without losing the general significance. The research aims to build the summarizer with the use of PyTeaser algorithm. Even though PyTeaser can not be directly used for Sinhala Language, by using language specific modifications, PyTeaser is made available for Sinhala. The logic behind the PyTeaser includes assigning a total score to each sentence based on four features: Title Score, Keyword Frequency, Sentence Length and Sentence Position. Total score is computed by weighting the mentioned features and those weights are constants. Then the sentences with the highest scores are selected to produce the summary. The quality of the summary is evaluated using F Measure, with the use of human-generated summaries which are produced by Sinhala experts. The research focuses to compute the F Measure for all the possible weight combinations made out with original weights of PyTeaser and choose the optimized weight vector which provides the best quality news summary. The summaries for the proposed application are generated using the derived weight set and then those news are expected to recommend to endusers via the recommender system, according to user preferences.	en_US
dc.language.iso	en	en_US
dc.publisher	National Language Processing Centre University of Moratuwa Sri Lanka	en_US
dc.subject	automatic text summarization	en_US
dc.subject	PyTeaser algorithm	en_US
dc.title	Automatic Sinhala news text summarizer	en_US
dc.type	Conference-Abstract	en_US
dc.identifier.year	2020	en_US
dc.identifier.conference	Symposium on Natural Language Processing 2020	en_US
dc.identifier.place	University of Moratuwa	en_US
dc.identifier.pgnos	p. 11	en_US
dc.identifier.proceeding	Proceedings of Symposium on Natural Language Processing 2020	en_US