deBas: a sinhala Interactive voice response (IVR) system

Nallathamby, JD; Kariyawasam, KKR; Pullaperuma, HD; Vithana, DC; Jayasena, S

UoM IR
→
Research Publications
→
Conference Proceedings
→
UoM Conferences
→
Faculty of Engineering Research Unit (ERU & MERCon)
→
ERU - 2011
→
View Item

dc.contributor.author	Nallathamby, JD
dc.contributor.author	Kariyawasam, KKR
dc.contributor.author	Pullaperuma, HD
dc.contributor.author	Vithana, DC
dc.contributor.author	Jayasena, S
dc.date.accessioned	2013-10-21T02:12:21Z
dc.date.available	2013-10-21T02:12:21Z
dc.date.issued	2011
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/8061
dc.description.abstract	Although there are widely used Interactive Voice Response (IVR) systems in many languages today, there is no Sinhala language IVR system yet. This paper talks about deBas IVR: a complete Sinhala IVR with automatic speech recognition (ASR) and text-to-speech (TTS) synthesis modules that work in compliance with Media Resource Control Protocol (MRCP). It discusses some background literature, the process taken, the overall design and implementation aspects and the future work that can be carried out in this area. In the ASR component, training the acoustic model is done with SphinxTrain, and decoding with PocketSphinx, which are based on Hidden Markov Models (HMM). In the TTS component, AMoRA Sinhala TTS knowledge base is used, which uses Festival speech synthesis engine and a female diphonic voice, built using Festvox voice building tools. Asterisk is used as the IVR gateway and dial-plan interpreter. MRCPv2 protocol has been followed in developing the speech resources, which uses Session Initiation Protocol (SIP) for establishing controlled connections to external media streaming devices and Real-time Transport Protocol (RTP) for media delivery. The language model of the ASR component has been restricted to digits from 0-9 that are commonly used in IVR systems and the set of words used for our demo application. The Word-Error-Rate and the Sentence-Error- Rate of the ASR component are reported to be 31.4% and 54% respectively, as observed in our experiments. In addition to these, we also introduce a new intonation model that can be applied to any existing Sinhala diphonic voices.
dc.language	en
dc.title	deBas: a sinhala Interactive voice response (IVR) system
dc.type	Conference-Full-text
dc.identifier.year	2011
dc.identifier.conference	Excellence in Research, Excelling a Nation
dc.identifier.place	Faculty of Engineering, University of Moratuwa
dc.identifier.pgnos	pp. 50-55
dc.identifier.proceeding	17th Annual Research Symposium on Excellence in Research, Excelling a Nation