A robust approach to open vocabulary image retrieval with deep convolutional neural networks and transfer learning

Padmakumar, V.; Ranga, R.; Elluru, S.; Sowmya, Kamath S.

Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/7117

Full metadata record

DC Field	Value	Language
dc.contributor.author	Padmakumar, V.	-
dc.contributor.author	Ranga, R.	-
dc.contributor.author	Elluru, S.	-
dc.contributor.author	Sowmya, Kamath S.	-
dc.date.accessioned	2020-03-30T09:58:31Z	-
dc.date.available	2020-03-30T09:58:31Z	-
dc.date.issued	2018	-
dc.identifier.citation	Proceedings of the 2018 Pacific Neighborhood Consortium Annual Conference and Joint Meetings: Human Rights in Cyberspace, PNC 2018, 2018, Vol., , pp.106-112	en_US
dc.identifier.uri	https://idr.nitk.ac.in/jspui/handle/123456789/7117	-
dc.description.abstract	Enabling computer systems to respond to conversational human language is a challenging problem with wideranging applications in the field of robotics and human computer interaction. Specifically, in image searches, humans tend to describe objects in fine-grained detail like color or company, for which conventional retrieval algorithms have shown poor performance. In this paper, a novel approach for open vocabulary image retrieval, capable of selecting the correct candidate image from among a set of distractions given a query in natural language form, is presented. Our methodology focuses on generating a robust set of image-text projections capable of accurately representing any image, with an objective of achieving high recall. To this end, an ensemble of classifiers is trained on ImageNet for representing high-resolution objects, Cifar 100 for smaller resolution images of objects and Caltech 256 for challenging views of everyday objects, for generating category-based projections. In addition to category based projections, we also make use of an image captioning model trained on MS COCO and Google Image Search (GISS) to capture additional semantic/latent information about the candidate images. To facilitate image retrieval, the natural language query and projection results are converted to a common vector representation using word embeddings, with which query-image similarity is computed. The proposed model when benchmarked on the RefCoco dataset, achieved an accuracy of 68.8%, while retrieving semantically meaningful candidate images. � 2018 Pacific Neighborhood Consortium (PNC).	en_US
dc.title	A robust approach to open vocabulary image retrieval with deep convolutional neural networks and transfer learning	en_US
dc.type	Book chapter	en_US
Appears in Collections:	2. Conference Papers

Files in This Item:

File	Description	Size	Format
9 A Robust Approach.pdf		2.17 MB	Adobe PDF	View/Open

Show simple item record