Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space

Bandara AMRR

Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space

Bandara AMRR

URI: http://dl.lib.uom.lk/handle/123/21175

Abstract:

The retrieval of temporal digital visual data, either by a text or visual query, requires automatic interpretation, which includes high-level annotation by object detection and recognition for text query-based retrieval and low-level abstraction for visual querybased retrieval. Both the accuracy and the speed of the interpretation become crucial factors in real-world applications, due to the high density of visual data. This study has focused on reducing the complexity of visual data efficiently by dimensionality reduction techniques for the detection and recognition of objects in videos for both textual annotation and visual query-based video frame retrieval. The contribution of the study includes three approaches, i.e., a novel visual feature descriptor based on colour dithering – namely Salient Dither Pattern Feature (SDPF), novel object segmentation method based on the proposed feature descriptor – namely Refining Superpixel and Histogram of oriented optical flow Clustering (RSHC) –, and a novel self-supervised local descriptor – namely Network-in-Network with Restricted Boltzmann Machine (NIN-RBM). The experimental results make it evident that the SDPF is rotation and scale invariant and computationally efficient yet shows similar object recognition accuracy to the state-of-the-art methods with minimum supervision. The results further revealed that RSHC has successfully utilized SDPF for accurately segmenting individual objects by using a very shallow history of motion. Furthermore, according to the results, NIN-RBM has shown the state-of-the-art correspondence matching performance over the existing deep-learned self-supervised binary descriptors, keeping the computation time at the minimum. The overall results support the conclusions that RSHC is capable of accurately segment objects in a video, and then SDPF can be successfully used for recognizing the segmented objects. Moreover, NIN-RBM can be used to reliably and rapidly retrieve video frames related to any visual query. Since NIN-RBM is a local descriptor, it can be further used for locating of high-level objects and estimating their poses precisely, to improve the details of semantics retrieved from video data.

Citation:

Bandara, A.M.R.R. (2021). Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space [Doctoral dissertation, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21175

Show full item record