Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets

Zhang, Hao, author; Beveridge, Ross, advisor; Draper, Bruce, advisor; Anderson, Chuck, committee member; Zhou, Yongcheng, committee member

Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets

dc.contributor.author	Zhang, Hao, author
dc.contributor.author	Beveridge, Ross, advisor
dc.contributor.author	Draper, Bruce, advisor
dc.contributor.author	Anderson, Chuck, committee member
dc.contributor.author	Zhou, Yongcheng, committee member
dc.date.accessioned	2016-07-12T23:03:09Z
dc.date.available	2016-07-12T23:03:09Z
dc.date.issued	2016
dc.description.abstract	Nearest neighbor search is an important operation whose goal is to find items in the dataset that are similar to a given query. It has a number of applications such as content based image retrieval (CBIR), near duplicate image detection and recommender systems. With the rapid development of the Internet and digital devices, it becomes easy to share and collect data. Taking a modern social network as an example, Facebook was reported in 2012 to be collecting more than 500 terabytes of text, images and videos each day. Conventional nearest neighbor search using linear scan becomes prohibitive when dealing with large-scale datasets like this. This thesis proposed a new quantization-based binary code learning algorithm, called Unit Query and Location Sensitive Hashing (UnitQLSH), to solve the problem of approximate nearest neighbor search for large-scale, unsupervised and unit-length data. UnitQLSH maps each high dimensional data sample to a binary code constrained to be residing on the unit-sphere. This constraint is very helpful in improving the retrieval performance. Also, UnitQLSH takes advantage of the approximate linearity of local neighborhoods of data to further improve performance. Moreover, given a query, a weight vector is computed based on it, indicating the significance of different bits. The Hamming distances are weighed by this vector to provide much more accurate retrievals than traditional approaches without any weighting schemes. Compared to existing state-of-the-art approaches, the proposed algorithm outperforms them significantly.
dc.format.medium	born digital
dc.format.medium	doctoral dissertations
dc.identifier	Zhang_colostate_0053A_13441.pdf
dc.identifier.uri	http://hdl.handle.net/10217/173342
dc.identifier.uri	https://doi.org/10.25675/3.024126
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2000-2019
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	hashing
dc.subject	nearest neighbor search
dc.subject	unsupervised
dc.subject	large-scale dataset
dc.subject	binary code
dc.subject	quantization
dc.title	Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhang_colostate_0053A_13441.pdf
Size:: 2.33 MB
Format:: Adobe Portable Document Format

Download

Collections

2000-2019
Theses and Dissertations