Augmenting the Performance of Image Similarity Search through Crowdsourcing

Bahareh Rahmanian
2015, May
Published in: 
The University of Sydney
The world has witnessed incredible advances in information and communication technology during the past decade. The availability of internet access and the evolution of the World Wide Web have provided an excellent platform for communication and have given rise to a new, efficient, on-demand and affordable workforce made up of humans which has contributed to the rise of crowdsourcing. Crowdsourcing is the concept of “outsourcing a task that is traditionally performed by an employee to a large group of people in the form of an open call” (Howe 2006). Many different platforms designed to perform several types of crowdsourcing (e.g. Amazon Mechanical Turk, InnoCentive, Threadless) and studies have shown that results produced by crowds in crowdsourcing platforms are generally accurate and reliable. For several years, researchers studied computational algorithms and developed machine learning methods with the goal of increased automation and replaced humans with computers to increase the accuracy and performance of diverse systems. But despite the improvements in computational algorithms, computers still perform very poorly in some fields of research and image similarity search is one of them. Rapid advances in image capturing devices and the availability of online photo storage services have caused the development of very large image databases and these image collections are of limited value without efficient image retrieval systems. An efficient image browsing, searching and retrieval system is required in various domains, including crime prevention, fashion and medicine. Many image retrieval systems have been developed based on two different approaches, text-based and content-based retrieval mechanisms. Using text-based search methods, text-based image retrieval systems provide a high performance image search system for fully annotated images. While collecting accurate annotations for large image databases is an expensive and time-consuming task, researchers started designing a new generation of image retrieval systems in the early 1980s. This new approach uses raw image data, indexes images based on their visual content and is called content-based image retrieval or CBIR. The fundamental difference between text-based image retrieval and CBIR is that, in the former, human interaction is necessary to provide meta-data (e.g. keyword, annotation) but, in the latter, the search is performed based on image content rather than meta-data. The lack of human interaction and the absence of a direct link between humans’ high-level concepts and the low-level features in CBIR systems have resulted in very low performance image similarity search systems. Crowdsourcing can provide a fast and efficient way to use the power of human computation to solve problems that are difficult for machines to perform. From several different microtasking crowdsourcing platforms available, we decided to perform our study using Amazon Mechanical Turk. In the context of our research we studied the effect of user interface design and its corresponding cognitive load on the performance of crowd-produced results. Our results highlighted the importance of a well-designed user interface on crowdsourcing performance. 2 Using crowdsourcing platforms such as Amazon Mechanical Turk, we can utilize humans to solve problems that are difficult for computers, such as image similarity search. However, in tasks like image similarity search, it is not possible to ask crowds to search within a database of millions of images; therefore, it is more efficient to design a hybrid human–machine system. Several researchers have studied the design of hybrid human–machine systems to cover the semantic gap of computational algorithms and human perceptions. In the context of our research, we studied the effect of involving the crowd on the performance of an image similarity search system and proposed a hybrid human–machine image similarity search system. Our proposed system uses machine power to perform heavy computations and to search for similar images within the image dataset and uses crowdsourcing to refine results. In another words our hybrid system is system composed of a CBIR retrieval algorithm to achieve recall and shallow filtering and a crowdsourced-based human input to achieve precision. We designed our CBIR system using SIFT, SURF, SURF128 and ORB feature detector/descriptors and compared the performance of the system using each feature detector/descriptor. Our experiment confirmed that crowdsourcing can dramatically improve the CBIR system performance.