Show simple item record

dc.contributor.author Fernando, S
dc.contributor.author Ranathunga, S
dc.contributor.editor Chathuranga, D
dc.date.accessioned 2022-09-01T09:37:38Z
dc.date.available 2022-09-01T09:37:38Z
dc.date.issued 2018-05
dc.identifier.citation S. Fernando and S. Ranathunga, "Evaluation of Different Classifiers for Sinhala POS Tagging," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 96-101, doi: 10.1109/MERCon.2018.8421997. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/18833
dc.description.abstract This paper presents a comparative evaluation of three state-of-the-art classifiers for Sinhala Parts-of-Speech (POS) tagging. Support Vector Machines (SVM), Hidden Markov Models (HMM) and Conditional Random Fields (CRF) based POS tagger models are generated and tested using different combinations of a corpus of news articles and a corpus of official government documents. CRF is used for the first time in Sinhala POS tagging, thus the best feature set is experimentally derived. To further improve the accuracy of POS tagging, a majority voting based ensemble tagger is created using three individual taggers. This ensemble tagger achieved the highest accuracy in POS tagging than any individual tagger. The two domains (news, and official government documents) used in this study have noticeable differences in writing style and vocabulary. Generating domain specific POS taggers is time consuming and costly due to the overhead involved in creating and manually tagging domain specific corpora, for low resourced languages in particular. Therefore, this study also evaluates the possibility and successfulness of using corpora of different domains in training and testing phases of aforementioned machine learning techniques. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/8421997 en_US
dc.subject Sinhala en_US
dc.subject Parts-of-Speech (POS) en_US
dc.subject HMM en_US
dc.subject CRF en_US
dc.subject Ensemble en_US
dc.subject Ensemble en_US
dc.title Evaluation of different classifiers for sinhala pos tagging en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2018 en_US
dc.identifier.conference 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.pgnos pp. 96-101 en_US
dc.identifier.proceeding Proceedings of 2018 Moratuwa Engineering Research Conference (MERCon) en_US
dc.identifier.doi 10.1109/MERCon.2018.8421997 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record