Institutional-Repository, University of Moratuwa.  

Cross-vit: cross-attention vision transformer for image duplicate detection

Show simple item record

dc.contributor.author Chandrasiri, MDN
dc.contributor.author Talagala, PD
dc.contributor.editor Piyatilake, ITS
dc.contributor.editor Thalagala, PD
dc.contributor.editor Ganegoda, GU
dc.contributor.editor Thanuja, ALARR
dc.contributor.editor Dharmarathna, P
dc.date.accessioned 2024-02-06T08:36:41Z
dc.date.available 2024-02-06T08:36:41Z
dc.date.issued 2023-12-07
dc.identifier.uri http://dl.lib.uom.lk/handle/123/22194
dc.description.abstract Duplicate detection in image databases has immense significance across diverse domains. Its utility transcends specific applications, adapting seamlessly to a range of use cases, either as a standalone process or an integrated component within broader workflows. This study explores cutting-edge vision transformer architecture to revolutionize feature extraction in the context of duplicate image identification. Our proposed framework combines the conventional transformer architecture with a groundbreaking cross-attention layer developed specifically for this study. This unique cross-attention transformer processes pairs of images as input, enabling intricate cross-attention operations that delve into the interconnections and relationships between the distinct features in the two images. Through meticulous iterations of Cross-ViT, we assess the ranking capabilities of each version, highlighting the vital role played by the integrated cross-attention layer between transformer blocks. Our research culminates in recommending a final optimal model that capitalizes on the synergies between higher-dimensional hidden embeddings and mid-size ViT variations, thereby optimizing image pair ranking. In conclusion, this study unveils the immense potential of the vision transformer and its novel cross-attention layer in the domain of duplicate image detection. The performance of the proposed framework was assessed through a comprehensive comparative evaluation against baseline CNN models using various benchmark datasets. This evaluation further underscores the transformative power of our approach. Notably, our innovation in this study lies not in the introduction of new feature extraction methods but in the introduction of a novel cross-attention layer between transformer blocks grounded in the scaled dot-product attention mechanism. en_US
dc.language.iso en en_US
dc.publisher Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa. en_US
dc.subject Duplicate image detection en_US
dc.subject Vision transformers en_US
dc.subject Attention en_US
dc.title Cross-vit: cross-attention vision transformer for image duplicate detection en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty IT en_US
dc.identifier.department Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa. en_US
dc.identifier.year 2023 en_US
dc.identifier.conference 8th International Conference in Information Technology Research 2023 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.pgnos pp. 1-6 en_US
dc.identifier.proceeding Proceedings of the 8th International Conference in Information Technology Research 2023 en_US
dc.identifier.email dncnawodya@gmail.com en_US
dc.identifier.email priyangad@uom.lk en_US


Files in this item

This item appears in the following Collection(s)

  • ICITR - 2023 [47]
    International Conference on Information Technology Research (ICITR)

Show simple item record