Abstract:
This dissertation is for a research that aimed at proposing a language model to
translate texts written in Singlish to English. Singlish is an alternative writing system
for Sinhala language that uses Latin scripts (English Alphabet) instead of using
native Sinhala alphabet. This had been a requirement for long period, since many Sri
Lankans use this writing method to write product reviews, social media posts and
comments etc. This has been tried since couple of years by many research students
but the main challenge was to find a proper data set to evaluate deep learning models
for this Natural Language Processing (NLP) task. Hence, traditional statistic, rulebased
models has
been
proposed
with less
data.
This
research
addresses
the
challenge
of
preparing a data set to evaluate a deep learning approach for this machine
translation activity and also to evaluate a seq2seq Neural Machine Translation
(NMT) model. The proposed seq2seq model is purely based on the attention
mechanism, as it has been used to improve NMT by selectively focusing on parts of
the source sentence during translation. The proposed approach can achieve 24.13
BLEU score on Singlish-English by seeing ~0.15 M parallel sentence pairs with ~50
K word vocabulary.
Citation:
Sandaruwan, H.G.D. (2021). Neural machine translation approach for Singlish to English translation [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21470