dc.contributor.author |
Tennakoon, S |
|
dc.contributor.editor |
Thayasivam, U |
|
dc.contributor.editor |
Rathnayaka, C |
|
dc.date.accessioned |
2025-01-24T08:25:31Z |
|
dc.date.available |
2025-01-24T08:25:31Z |
|
dc.date.issued |
2020 |
|
dc.identifier.citation |
Tennakoon, S., (2020). Automatic Sinhala news text summarizer. In U. Thayasivam., & C. Rathnayaka, (Ed.), Symposium on Natural Language Processing 2020: Proceedings of Symposium on Natural Language Processing 2020 (p. 11). National Language Processing Centre University of Moratuwa. http://dl.lib.uom.lk/handle/123/23264 |
|
dc.identifier.uri |
http://dl.lib.uom.lk/handle/123/23264 |
|
dc.description.abstract |
With the present explosion of news circulating the digital space, which consists mostly of
unstructured textual data, there is a need to absorb the content of news easily and effectively. While
there are many Sinhala news sites out there, no site facilitates recommendation despite the
popularity of recommender systems in the current age and day. Therefore, it is effective if the news
were presented in a summarized version which tally with the user preferences as well. Our research
aims to fill this gap by providing a centralized news platform which recommends news to its users
clearly and concisely. The news articles were collected using web scraping and after performing
categorization it will be presented in a summarized context. Also, we expect to detect the grey
sheep users and to provide separate recommendations to them in order to minimize errors in
recommendation. Here the grey sheep users refer to the user group who have special tastes and
they may neither agree nor disagree with majority of users. By implementing the proposed system,
we hope to provide appropriate solutions to the mentioned requirements and build a user-friendly
Sinhala news platform. Considering about the application, manually creating a summary can be
time consuming and tedious. The main idea behind building an automatic text summarization is to
distinguish the highest significant information from the given content, decrease of the offered text
to fewer sentences without leaving the fundamental thoughts of the first content and present it to
the end-readers. Implementation of a specific summarizer for Sinhala Language is a major
requirement to develop such an application because there is no Sinhala news platform available
which presents summarized and categorized news text to the users. The objective is to produce a
brief and exact outline of voluminous news messages while focusing on the key thoughts that
convey beneficial information without losing the general significance. The research aims to build
the summarizer with the use of PyTeaser algorithm. Even though PyTeaser can not be directly
used for Sinhala Language, by using language specific modifications, PyTeaser is made available
for Sinhala. The logic behind the PyTeaser includes assigning a total score to each sentence based
on four features: Title Score, Keyword Frequency, Sentence Length and Sentence Position. Total
score is computed by weighting the mentioned features and those weights are constants. Then the
sentences with the highest scores are selected to produce the summary. The quality of the summary
is evaluated using F Measure, with the use of human-generated summaries which are produced by
Sinhala experts. The research focuses to compute the F Measure for all the possible weight
combinations made out with original weights of PyTeaser and choose the optimized weight vector
which provides the best quality news summary. The summaries for the proposed application are
generated using the derived weight set and then those news are expected to recommend to endusers
via the recommender system, according to user preferences. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
National Language Processing Centre University of Moratuwa Sri Lanka |
en_US |
dc.subject |
automatic text summarization |
en_US |
dc.subject |
PyTeaser algorithm |
en_US |
dc.title |
Automatic Sinhala news text summarizer |
en_US |
dc.type |
Conference-Abstract |
en_US |
dc.identifier.year |
2020 |
en_US |
dc.identifier.conference |
Symposium on Natural Language Processing 2020 |
en_US |
dc.identifier.place |
University of Moratuwa |
en_US |
dc.identifier.pgnos |
p. 11 |
en_US |
dc.identifier.proceeding |
Proceedings of Symposium on Natural Language Processing 2020 |
en_US |