Repository logo

Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets

Loading...
Thumbnail Image

Date

Authors

Zhang, Hao, author

Beveridge, Ross, advisor

Draper, Bruce, advisor

Anderson, Chuck, committee member

Zhou, Yongcheng, committee member

Journal Title

Journal ISSN

Volume Title

Abstract

Nearest neighbor search is an important operation whose goal is to find items in the dataset that are similar to a given query. It has a number of applications such as content based image retrieval (CBIR), near duplicate image detection and recommender systems. With the rapid development of the Internet and digital devices, it becomes easy to share and collect data. Taking a modern social network as an example, Facebook was reported in 2012 to be collecting more than 500 terabytes of text, images and videos each day. Conventional nearest neighbor search using linear scan becomes prohibitive when dealing with large-scale datasets like this. This thesis proposed a new quantization-based binary code learning algorithm, called Unit Query and Location Sensitive Hashing (UnitQLSH), to solve the problem of approximate nearest neighbor search for large-scale, unsupervised and unit-length data. UnitQLSH maps each high dimensional data sample to a binary code constrained to be residing on the unit-sphere. This constraint is very helpful in improving the retrieval performance. Also, UnitQLSH takes advantage of the approximate linearity of local neighborhoods of data to further improve performance. Moreover, given a query, a weight vector is computed based on it, indicating the significance of different bits. The Hamming distances are weighed by this vector to provide much more accurate retrievals than traditional approaches without any weighting schemes. Compared to existing state-of-the-art approaches, the proposed algorithm outperforms them significantly.

Description

Rights Access

Subject

hashing

nearest neighbor search

unsupervised

large-scale dataset

binary code

quantization

Citation

Endorsement

Review

Supplemented By

Referenced By