Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets
dc.contributor.author | Zhang, Hao, author | |
dc.contributor.author | Beveridge, Ross, advisor | |
dc.contributor.author | Draper, Bruce, advisor | |
dc.contributor.author | Anderson, Chuck, committee member | |
dc.contributor.author | Zhou, Yongcheng, committee member | |
dc.date.accessioned | 2016-07-12T23:03:09Z | |
dc.date.available | 2016-07-12T23:03:09Z | |
dc.date.issued | 2016 | |
dc.description.abstract | Nearest neighbor search is an important operation whose goal is to find items in the dataset that are similar to a given query. It has a number of applications such as content based image retrieval (CBIR), near duplicate image detection and recommender systems. With the rapid development of the Internet and digital devices, it becomes easy to share and collect data. Taking a modern social network as an example, Facebook was reported in 2012 to be collecting more than 500 terabytes of text, images and videos each day. Conventional nearest neighbor search using linear scan becomes prohibitive when dealing with large-scale datasets like this. This thesis proposed a new quantization-based binary code learning algorithm, called Unit Query and Location Sensitive Hashing (UnitQLSH), to solve the problem of approximate nearest neighbor search for large-scale, unsupervised and unit-length data. UnitQLSH maps each high dimensional data sample to a binary code constrained to be residing on the unit-sphere. This constraint is very helpful in improving the retrieval performance. Also, UnitQLSH takes advantage of the approximate linearity of local neighborhoods of data to further improve performance. Moreover, given a query, a weight vector is computed based on it, indicating the significance of different bits. The Hamming distances are weighed by this vector to provide much more accurate retrievals than traditional approaches without any weighting schemes. Compared to existing state-of-the-art approaches, the proposed algorithm outperforms them significantly. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Zhang_colostate_0053A_13441.pdf | |
dc.identifier.uri | http://hdl.handle.net/10217/173342 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2000-2019 | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | hashing | |
dc.subject | nearest neighbor search | |
dc.subject | unsupervised | |
dc.subject | large-scale dataset | |
dc.subject | binary code | |
dc.subject | quantization | |
dc.title | Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zhang_colostate_0053A_13441.pdf
- Size:
- 2.33 MB
- Format:
- Adobe Portable Document Format