Abstract:
This paper describes a new model of predicting prosodic phrase breaks in Sinhala language in order to
improve the quality of the existing TTS Sinhala voices. In a
Text To Speech (TTS) system, quality of the synthetic voice is
mainly dependent on, how well its prosodic model is
implemented. The prosodic model adjusts the phrasing and the
pitch of the voice while applying suitable durations and tones
for words and diphones. Out of these, phrasing and pitch of
the voice carries much importance since appropriate phrase
breaking helps to clearly understand the synthesis voice. In a
real world scenario, when we speak a sentence, we
automatically divide it to small segments and apply pauses at
those breaks. Also the pitch of the voice gets lowered near a
break and gets increased in the other segments automatically.
But in a TTS system, we do not have that advantage and
therefore need to be precised with the phrase breaks.
Otherwise it will create wrong meanings as well as producing
unnatural speech. Existing Sinhala TTS systems lacks proper
prosody implementations and hence difficult to understand
when it reads, especially long sentences. This issue can be
overcome by applying a suitable phrase breaking technique.