dc.description.abstract |
This thesis describes a restricted-domain question answering system which can be used in automating a customer helpdesk of a commercial organization. Even though there has been an increasing interest in data-driven methods over the past decade to achieve more natural human-machine interactions, such methods require a large amount of manually labeled representative data on how user converses with a machine. However, this is a requirement that is difficult to be satisfied in the early phase of system development. In addition, the systems should be maintainable by a domain expert who is less technically skilled when compared to a computer engineer. The knowledge based approach that is presented here is aimed at maximally making use of the user experience available with the customer service representatives (CSRs) in the organization and presents how true representative data can be collected. The approach takes into account the syntactic, lexical, and morphological variations, as well as a way of synonym transduction that is allowed to vary over the system's knowledge base. The query understanding method, which is based on a statistical classifier, a ranking algorithm based on Vector Space Model (VSM) and a pattern writing process, takes into account the intent, context, and content components of natural language meaning as well as the word order. A genetic algorithm-based method is presented for finding the domain specific ranking parameters. An evaluation of the approach is presented by deploying a system in a real-world enterprise helpdesk environment in the telecommunication domain. The evaluation shows that the system is able to answer user questions with an accuracy of 94.4%. Furthermore, maintenance of the deployed system is carried out by CSRs successfully. |
en_US |