dc.contributor.author |
Chandrasiri, MDN |
|
dc.contributor.author |
Talagala, PD |
|
dc.contributor.editor |
Piyatilake, ITS |
|
dc.contributor.editor |
Thalagala, PD |
|
dc.contributor.editor |
Ganegoda, GU |
|
dc.contributor.editor |
Thanuja, ALARR |
|
dc.contributor.editor |
Dharmarathna, P |
|
dc.date.accessioned |
2024-02-06T08:36:41Z |
|
dc.date.available |
2024-02-06T08:36:41Z |
|
dc.date.issued |
2023-12-07 |
|
dc.identifier.uri |
http://dl.lib.uom.lk/handle/123/22194 |
|
dc.description.abstract |
Duplicate detection in image databases has immense
significance across diverse domains. Its utility transcends specific
applications, adapting seamlessly to a range of use cases, either as
a standalone process or an integrated component within broader
workflows. This study explores cutting-edge vision transformer
architecture to revolutionize feature extraction in the context of
duplicate image identification. Our proposed framework combines
the conventional transformer architecture with a groundbreaking
cross-attention layer developed specifically for this
study. This unique cross-attention transformer processes pairs
of images as input, enabling intricate cross-attention operations
that delve into the interconnections and relationships between the
distinct features in the two images. Through meticulous iterations
of Cross-ViT, we assess the ranking capabilities of each version,
highlighting the vital role played by the integrated cross-attention
layer between transformer blocks. Our research culminates in
recommending a final optimal model that capitalizes on the
synergies between higher-dimensional hidden embeddings and
mid-size ViT variations, thereby optimizing image pair ranking.
In conclusion, this study unveils the immense potential of the
vision transformer and its novel cross-attention layer in the
domain of duplicate image detection. The performance of the
proposed framework was assessed through a comprehensive comparative
evaluation against baseline CNN models using various
benchmark datasets. This evaluation further underscores the
transformative power of our approach. Notably, our innovation
in this study lies not in the introduction of new feature extraction
methods but in the introduction of a novel cross-attention layer
between transformer blocks grounded in the scaled dot-product
attention mechanism. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa. |
en_US |
dc.subject |
Duplicate image detection |
en_US |
dc.subject |
Vision transformers |
en_US |
dc.subject |
Attention |
en_US |
dc.title |
Cross-vit: cross-attention vision transformer for image duplicate detection |
en_US |
dc.type |
Conference-Full-text |
en_US |
dc.identifier.faculty |
IT |
en_US |
dc.identifier.department |
Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa. |
en_US |
dc.identifier.year |
2023 |
en_US |
dc.identifier.conference |
8th International Conference in Information Technology Research 2023 |
en_US |
dc.identifier.place |
Moratuwa, Sri Lanka |
en_US |
dc.identifier.pgnos |
pp. 1-6 |
en_US |
dc.identifier.proceeding |
Proceedings of the 8th International Conference in Information Technology Research 2023 |
en_US |
dc.identifier.email |
dncnawodya@gmail.com |
en_US |
dc.identifier.email |
priyangad@uom.lk |
en_US |