Abstract:
Machine Translation is one of the least achieved areas in the area of natural language
processing. This is because natural languages are complex, a word can have several
meanings, a sentence can have several translations and the translation of a sentence
may depend on the context. In this report we describe an approach to machine
translation for Sinhala and English languages.
We postulate that humans are able to translate natural languages through simple rules
and experience collected without being knowledgeable about sophisticated language
construction such as morphology, syntax, semantics and pragmatic structures. This
hypothesis has been inspired by the fact that humans construct word forms, phrases
and sentences with new words they learn by using simple rules without even being
fully conscious about the rules. We do not ignore the fact that all words in a
vocabulary do not follow the same rules for forming words. Humans use specific
knowledge about certain words when they construct sentences. Also the word
selection in a translated sentence varies depending on the context or the semantics of
the sentence. Due to this complexity, we focus on a hybrid approach which uses both
rules and statistics.
The system described in this thesis focuses on modeling the steps taken by a human to
translate a sentence from one language to the other. A bilingual dictionary is used to
modal the knowledge of words and synonyms in both languages. Exceptional word
dictionaries are used as equivalents to the knowledge of the special words which do
not follow the common rules of morphology. The language parsers handle the syntax
of sentences in either language. Morphology analyzers are used to handle the rules
used in constructing word forms while statistical analyzers are used to handle the
proper word usage depending on the syntax.
The system was evaluated by comparing human translation with the machine
translation output. The two dominating factors considered were, how understandable
the translated sentence is and how much information the translated sentence retains
compared to the original. The results are up to the expected quality and further work
is required to improve the semantics of translation.