Activity recognition combined with scene context and action sequence

Ramasinghe, SC

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Electronics & Telecommunication Engineering
→
Master of Philosophy (M.Phil.)
→
View Item

dc.contributor.advisor	Rodrigo, R
dc.contributor.author	Ramasinghe, SC
dc.date.accessioned	2019-01-31T01:27:54Z
dc.date.available	2019-01-31T01:27:54Z
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/13876
dc.description.abstract	In this study, we investigate the problem of automatic action recognition and classification of videos. First, we present a convolutional neural network architecture, which takes both motion and static information as inputs in a single stream. We show the network is able to treat motion and static information as different feature maps and extract features off them, even though stacked together. By our results, we justify the use of optic flows as the raw information of motion. We demonstrate that our network is able to surpass state-of-the-art hand-engineered feature methods. Furthermore, the effect of providing static information to the network, in the task of action recognition, is also studied and compared here. Then, a novel pipeline is proposed, in order to recognize complex actions. A complex activity is a temporal composition of subevents, and a sub-event typically consists of several low level micro-actions, such as body movement, done by different actors. Extracting these micro actions explicitly is beneficial for complex activity recognition due to actor selectivity, higher discriminative power, and motion clutter suppression. Moreover, considering both static and motion features is vital for activity recognition. However, how to control the contribution from each feature domain optimally still remains uninvestigated. In this work, we extract motion features in micro level, preserving the actor identity, to later obtain a high-level motion descriptor using a probabilistic model. Furthermore, we propose two novel schemas for combining static and motion features: Cholesky transformation based and entropy based. The former allows to control the contribution ratio precisely, while the latter uses the optimal ratio mathematically. The ratio given by an entropy based method matches well with the experimental values obtained by a Choleksy transformation based method. This analysis also provides the ability to characterize a dataset, according to its richness in motion information. Finally, we study the effectiveness of modeling the temporal evolution of sub-event using an LSTM network. Experimental results demonstrate that the proposed technique outperforms state- of-the-art, when tested against two popular datasets.	en_US
dc.language.iso	en	en_US
dc.subject	Human action recognition	en_US
dc.subject	Convolutional Neural Networks (CNN)	en_US
dc.subject	Recurrent Neural Networks (RNN)	en_US
dc.subject	Long Short-Term Memory (LSTM)	en_US
dc.subject	Dense trajecories	en_US
dc.subject	BoVW	en_US
dc.title	Activity recognition combined with scene context and action sequence	en_US
dc.type	Thesis-Full-text	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	Master of Philosophy (MPhil)	en_US
dc.identifier.department	Department of Electronic and Telecommunication Engineering	en_US
dc.date.accept	2017-09
dc.identifier.accno	TH3526	en_US