A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling

Chamishka, S; Madhavi, I; Nawaratne, R; Alahakoon, D; De Silva, D; Chilamkurti, N; Nanayakkara, V

UoM IR
→
Research Publications
→
Journals and Magazines
→
Articles authored by UoM staff (Publish in scimago's Q1 journals)
→
View Item

dc.contributor.author	Chamishka, S
dc.contributor.author	Madhavi, I
dc.contributor.author	Nawaratne, R
dc.contributor.author	Alahakoon, D
dc.contributor.author	De Silva, D
dc.contributor.author	Chilamkurti, N
dc.contributor.author	Nanayakkara, V
dc.date.accessioned	2023-06-21T08:02:58Z
dc.date.available	2023-06-21T08:02:58Z
dc.date.issued	2022
dc.identifier.citation	Chamishka, S., Madhavi, I., Nawaratne, R., Alahakoon, D., De Silva, D., Chilamkurti, N., & Nanayakkara, V. (2022). A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling. Multimedia Tools and Applications, 81(24), 35173–35194. https://doi.org/10.1007/s11042-022-13363-4	en_US
dc.identifier.issn	1573-7721	en_US
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/21137
dc.description.abstract	The advancements of the Internet of Things (IoT) and voice-based multimedia applications have resulted in the generation of big data consisting of patterns, trends and associations capturing and representing many features of human behaviour. The latent representations of many aspects and the basis of human behaviour is naturally embedded within the expression of emotions found in human speech. This signifies the importance of mining audio data collected from human conversations for extracting human emotion. Ability to capture and represent human emotions will be an important feature in next-generation artificial intelligence, with the expectation of closer interaction with humans. Although the textual representations of human conversations have shown promising results for the extraction of emotions, the acoustic feature-based emotion detection from audio still lags behind in terms of accuracy. This paper proposes a novel approach for feature extraction consisting of Bag-of-Audio-Words (BoAW) based feature embeddings for conversational audio data. A Recurrent Neural Network (RNN) based state-of-the-art emotion detection model is proposed that captures the conversation-context and individual party states when making real-time categorical emotion predictions. The performance of the proposed approach and the model is evaluated using two benchmark datasets along with an empirical evaluation on real-time prediction capability. The proposed approach reported 60.87% weighted accuracy and 60.97% unweighted accuracy for six basic emotions for IEMOCAP dataset, significantly outperforming current state-of-the-art models.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Springer Netherlands	en_US
dc.subject	Bag-of-audio-words	en_US
dc.subject	Machine learning	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Big data . Emotion analysis	en_US
dc.title	A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling	en_US
dc.type	Article-Full-text	en_US
dc.identifier.year	2022	en_US
dc.identifier.journal	Multimedia Tools and Applications	en_US
dc.identifier.volume	81	en_US
dc.identifier.database	Springer Link	en_US
dc.identifier.pgnos	35173–35194	en_US
dc.identifier.doi	10.1007/s11042-022-13363-4	en_US