Sinhala code-mixed text translation using neural machine translation

Archchana, K

UoM IR
→
Thesis & Dissertation
→
Faculty of IT, Computational Mathematics
→
Master of Philosophy (M.Phil.)
→
View Item

dc.contributor.advisor	Sumathipala S
dc.contributor.advisor	Silva T
dc.contributor.author	Archchana, K
dc.date.accessioned	2024-10-10T07:59:10Z
dc.date.available	2024-10-10T07:59:10Z
dc.date.issued	2024
dc.identifier.citation	Archchana, K. (2024). Sinhala code-mixed text translation using neural machine translation [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22898
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/22898
dc.description.abstract	Mixing two or more languages together in communication is called as code-mixing. In South Asian communities it has become famous due to bilingualism or multilingualism. Sinhala-English code-mixed(SECM) text is the most popular language used in Sri Lanka in casual talks such as social media comments, posts, chats, etc. On social media platforms, the contents such as posts and comments are used for personalized advertisement recommendations, post recommendations, interesting content recommendations, etc., to provide better customer service according to their interest. Due to the code-mixing nature of the language, most of the Srilankan social media content is unused for recommendation purposes. So our research study mainly focuses on translating the SECM text to the Sinhala language. Once the contents are converted to a standard language, the social media contents can be processed easily and used for the necessary purposes. In this research, we initially conduct an in-depth analysis of Sinhala-English code-mixed. Issues that are considered as barriers to translate the SECM to Sinhala are identified. Also, we conducted a thorough literature study of code-mixed text analysis. An SECM-Sinhala parallel corpus with 5000 parallel sentences are used for this research study. The approach proposed for the SECM to Sinhala translation consists of a normalization layer, Encoder-Decoder framework(Seq2Seq), LSTM and Teacher Forcing mechanism. We evaluated our proposed approach with other translation approaches proposed for code-mixed text translation, and our approach gave a significantly higher BLEU score. Key words Code-mixing, Bilingualism, Multilingualism, LSTM, Teacher Forcing	en_US
dc.language.iso	en	en_US
dc.subject	CODE-MIXING
dc.subject	MULTILINGUALISM
dc.subject	LSTM \| TEACHER FORCING
dc.subject	COMPUTATIONAL MATHEMATICS– Dissertation
dc.subject	Master of Philosophy (MPhil)
dc.title	Sinhala code-mixed text translation using neural machine translation	en_US
dc.type	Thesis-Full-text	en_US
dc.identifier.faculty	IT	en_US
dc.identifier.degree	Master of Philosophy (MPhil)	en_US
dc.identifier.department	Department of Computational Mathematics	en_US
dc.date.accept	2024
dc.identifier.accno	TH5542	en_US