Abstract:
Speech embeddings produced by Deep Neural Networks have yielded promising results in a variety of speech processing applications. However, the performance in speech tasks like automatic speech recognition and speech intent identification can be affected to a great extent when there is a discrepancy between training and testing conditions. This is because, in addition to linguistic information, speech signals carry para-linguistic information including speaker characteristics, emotional states, and accent. Variations in the speaker traits and states lead to compromise on performance in speech recognition applications that require only linguistic information. Over the years, there have been various approaches that attempt to disentangle the para-linguistic information that support the linguistic information in speech. The commonly used strategy is to integrate speaker representations into speech recognition models to normalise the speaker effects. Still, it has received less attention when it comes to studies on speech-to-intent classification. Furthermore, large amounts of labeled speech data are required for these speaker normalisation techniques. Under low-resource settings, when there is only a limited number of speech samples available for training, transfer learning strategies can be used. This study presents a speaker-invariant speech intent classification model using i-vector based feature augmentation. We investigate the use of pre-trained acoustic models for transferlearning under low-resource settings. The proposed method is evaluated on the banking domain speech intent dataset in Sinhala and Tamil languages along with fluent speech command dataset. Experimental results show the effectiveness of the proposed method in achieving better prediction in the speech-to-intent classification model
Citation:
Ignatius, A. (2021). Speech embedding with segregation of paralinguistic information for low-resource languages [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22663