Abstract:
The purpose of text emotion analysis is to detect and
recognize the classification of feeling expressed in text. In recent
years, there has been an increase in text emotion analysis studies
for English language since data were abundant. Due to the growth
of social media large amount data are now available for regional
languages such as Tamil and Sinhala as well. However, these
languages lack necessary annotated corpus for many NLP tasks
including emotion analysis. In this paper, we present our scalable
semi-automatic approach to create an annotated corpus named
ACTSEA for Tamil and Sinhala to support emotion analysis.
Alongside, our analysis on a sample of the produced data and the
useful findings are presented for the low resourced NLP
community to benefit. For ACTSEA, data were gathered from
twitter platform and annotated manually after cleaning. We
collected 600280 (Tamil) and 318308 (Sinhala) tweets in total
which makes our corpus largest data collection which is currently
available for these languages.