Abstract:
Named-Entity-Recognition (NER) is one of the
major tasks under Natural Language Processing, which is widely
used in the fields of Computer Science and Computational
Linguistics. However, the amount of prior research done on NER
for Sinhala is very minimal. In this paper, we present data-driven
techniques to detect Named Entities in Sinhala text, with the use
of Conditional Random Fields (CRF) and Maximum Entropy
(ME) statistical modeling methods. Results obtained from
experiments indicate that CRF, which provided the highest
accuracy for the same task for other languages outperforms ME
in Sinhala NER as well. Furthermore, we identify different
linguistic features such as orthographic word level and contextual
information that are effective with both CRF and ME
Algorithms.
Citation:
S. A. P. M. Manamini et al., "Ananya - a Named-Entity-Recognition (NER) system for Sinhala language," 2016 Moratuwa Engineering Research Conference (MERCon), 2016, pp. 30-35, doi: 10.1109/MERCon.2016.7480111.