Abstract:
Machine translation is a cost-effective, quick, and widely accepted automated language translation method that has become essential in the modern and ever more globalized world. Machine translation can be done with one or more different approaches, including dictionary-based, rule-based, example-based, phrase-based, statistical, or neural-linguistic approaches. Nevertheless, most of the existing machine translation systems show a quality gap when compared with human translation. Thus, human translation has been considered as the best language translation method sofar. Human language translation is a complex and opportunistic process depends on human memory. This human language translation process has been described through a few theories. Among them, the garden path model and the constraint satisfaction model are two fundamental approaches available for human language translation, especially concerning sentence parsing with meaning. These two theoretical models demonstrate how to select suitable words in the phrase of a sentence to generate accepted meanings. Based on these two theories, a hybrid approach to machine translation has been proposed. This proposed approach is stimulated by how people parse and translate a sentence by putting available phrases together with accepted meaning. According to the approach, translation is done in three stages. In the first stage, the system analyses the given sentence by considering the morphology, syntax, and semantics of the source language. Then, the system uses phrase-based translation and translates each phrase into the target with multiple solutions. The phrase translation is done considering the four factors of psycholinguistic parsing techniques, such as phrase structure, semantic features, thematic roles, and probability. Finally, considering all the translated phrases, the system should be capable of identifying suitable target language phrases to take accepted meanings, considering subject-verb and object-verb agreements. After the subject-verb-object agreement, other available phrases in the sentence should be capable of re-arranging according to the accepted subject, object, and verb phrases.
This approach has been simulated with the multi-agent system named EnSiMaS, which translates English text into Sinhala. The EnSiMaS was implemented on the MaSMT framework, which was specially developed for agent-based machine translation. The EnSiMaS comprises of 26 language processing agents on both source and target languages. These agents were clustered into six agent swarms considering morphological, syntactical, and semantical concerns of the source and the target languages. In addition to these language-processing agents, the system should be able to create an agent dynamically for each source language phrase. These dynamically created phrase agents should be capable of communicating with other relevant phrases and taking the accepted solutions.
The EnSiMaS was tested with 85 sample English sentences. For each English sentence, three different translations were taken. According to the evaluation result, the system shows an 8.77% word error rate, a 6.72% inflexion error rate, and a 5.37% sentence error rate for the first, second, and third translations. In addition, calculated BLUE scores show 0.89160756, 0.52009204, and 0.43581893 for the first, second, and third translations. Then randomly selected 25 samples sentences are used to calculate the adequacy and fluency of the EnSiMaS. Adequacy and fluency rates were taken from 55 human evaluators considering the human-translated reference sentences. The Kendal’s Tau correlation coefficient shows that there is a weak positive association between adequacy levels of human translations vs EnSiMaS system translations and moderate positive association between fluency levels of human translation and EnSiMaS system translation. Further, according to the Fleiss Kappa coefficient method, there is a significant fair agreement on raters for adequacy and fluency ratings