Institutional-Repository, University of Moratuwa.  

Bilingual lexical induction for sinhala-english using cross lingual embedding spaces

Show simple item record

dc.contributor.author Liyanage, A
dc.contributor.author Ranathunga, S
dc.contributor.author Jayasena, S
dc.contributor.editor Adhikariwatte, W
dc.contributor.editor Rathnayake, M
dc.contributor.editor Hemachandra, K
dc.date.accessioned 2022-10-19T05:39:24Z
dc.date.available 2022-10-19T05:39:24Z
dc.date.issued 2021-07
dc.identifier.citation A. Liyanage, S. Ranathunga and S. Jayasena, "Bilingual Lexical Induction for Sinhala-English using Cross Lingual Embedding Spaces," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 579-584, doi: 10.1109/MERCon52712.2021.9525667. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/19131
dc.description.abstract Bilingual lexicons are an important resource in Natural Language Processing (NLP). Such resources are scarce for Low Resource languages (LRLs) such as Sinhala. However, research on Bilingual Lexical Induction (BLI) on low resource settings is limited. This paper presents the first-ever implementation of BLI for the Sinhala-English language pair. Following the recently introduced VecMap model, we map the vectors of words belonging to both Sinhala and English into a shared vector space and measure the Cross Lingual (CL) similarity between the words. The closest English word for a given Sinhala word in this CL vector space is taken as the corresponding similar word. Currently, there is no detailed evaluation with respect to the size and the nature of the dataset used to create the word vectors, type of the evaluation dictionary, or the technique used to create the word vectors. This paper presents a comprehensive analysis of how these factors affect BLI for Sinhala and English languages and shows that the BLI results have a heavy dependency on these factors. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9525667 en_US
dc.subject Sinhala en_US
dc.subject Embedding Models en_US
dc.subject Mapped Embedding Spaces en_US
dc.subject Bilingual Lexicon Induction en_US
dc.title Bilingual lexical induction for sinhala-english using cross lingual embedding spaces en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2021 en_US
dc.identifier.conference Moratuwa Engineering Research Conference 2021 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.pgnos pp. 579-584 en_US
dc.identifier.proceeding Proceedings of Moratuwa Engineering Research Conference 2021 en_US
dc.identifier.doi 10.1109/MERCon52712.2021.9525667 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record