Abstract:
For English, Named Entity Recognition
(NER) is more or less a solved problem. However, for
low-resourced and morphologically rich languages such
as Sinhala, minimal research has been done. In this
paper, we present a novel fine-grained Named Entity
(NE) tag set and an NE annotated Sinhala corpus of
70k word tokens. We trained a custom NER model for
Sinhala based on Conditional Random Fields (CRF).
Despite the low-resourced setting, this NER model
could achieve an micro-averaged F1 score of 84.8.
Citation:
R. Azeez and S. Ranathunga, "Fine-Grained Named Entity Recognition for Sinhala," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 295-300, doi: 10.1109/MERCon50084.2020.9185296.