Show simple item record

dc.contributor.advisor Ranathunga S
dc.contributor.author Kariyawasam KKR
dc.date.accessioned 2020
dc.date.available 2020
dc.date.issued 2020
dc.identifier.uri http://dl.lib.uom.lk/handle/123/16779
dc.description.abstract Community based question answering forums are very popular these days. People tend to refer community forums for opinions in various fields such as electronics, medical and automobile. It is very easy and useful to find a good opinion freely, but it is hard to choose the correct one when there are thousands of reviews. There have been several efforts to automate the activities of community-based question answering systems, such as the selection of the most relevant answers to the question (question comment similarity), and identifying the questions already posted that are similar to the new question (question-question similarity). However, there are fewer attempts taken to automate the process of duplicate detection in community question answering systems. At the moment, it is the community itself that manually detects duplicates. The automation attempts are more into individual domains. The objective of this research is to implement a mechanism that effectively identifies duplicate questions in a data set consisting of question-answer sets from multiple domains. Solution we propose consists of two focus areas such as classification and retrieval. A neural network composed of two parallel LSTM layers (to represent query and candidate question), attention layer and a gradient reversal layer (based on domain) is proposed as the question pair classifier. It’s trained for individual domains (without gradient reversal) and achieved better accuracy than the latest baseline research for this dataset for 9 out of 12 domains. For retrieval the approach was to retrieve 20 candidates using BM25 and re-rank using classifiers trained already. This selects the duplicate into top 10 with better MAP than BM25 does 6 out of 12 domains. Another important observation is that the common model built with all the data combined gained better MAP than the individual models for 7 domains out of 12 in the retrieval case. en_US
dc.language.iso en en_US
dc.subject COMPUTER SCIENCE AND ENGINEERING-Dissertations en_US
dc.subject COMPUTER SCIENCE-Dissertations en_US
dc.subject MULTI-DOMAIN DATA en_US
dc.subject SIAMESE NEURAL NETWORKS en_US
dc.subject DOMAIN ADAPTATION en_US
dc.subject INFORMATION RETRIEVAL-Question Pair Classification en_US
dc.subject INFORMATION RETRIEVAL-Duplicate Question Retrieval en_US
dc.title Duplicate detection in multi-domain community question answering en_US
dc.type Thesis-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc in Computer Science en_US
dc.identifier.department Department of Computer Science & Engineering en_US
dc.date.accept 2020
dc.identifier.accno TH4254 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record