Repository logo
 

Unsupervised binary code learning for approximate nearest neighbor search in large-scale datasets

dc.contributor.authorZhang, Hao, author
dc.contributor.authorBeveridge, Ross, advisor
dc.contributor.authorDraper, Bruce, advisor
dc.contributor.authorAnderson, Chuck, committee member
dc.contributor.authorZhou, Yongcheng, committee member
dc.date.accessioned2016-07-12T23:03:09Z
dc.date.available2016-07-12T23:03:09Z
dc.date.issued2016
dc.description.abstractNearest neighbor search is an important operation whose goal is to find items in the dataset that are similar to a given query. It has a number of applications such as content based image retrieval (CBIR), near duplicate image detection and recommender systems. With the rapid development of the Internet and digital devices, it becomes easy to share and collect data. Taking a modern social network as an example, Facebook was reported in 2012 to be collecting more than 500 terabytes of text, images and videos each day. Conventional nearest neighbor search using linear scan becomes prohibitive when dealing with large-scale datasets like this. This thesis proposed a new quantization-based binary code learning algorithm, called Unit Query and Location Sensitive Hashing (UnitQLSH), to solve the problem of approximate nearest neighbor search for large-scale, unsupervised and unit-length data. UnitQLSH maps each high dimensional data sample to a binary code constrained to be residing on the unit-sphere. This constraint is very helpful in improving the retrieval performance. Also, UnitQLSH takes advantage of the approximate linearity of local neighborhoods of data to further improve performance. Moreover, given a query, a weight vector is computed based on it, indicating the significance of different bits. The Hamming distances are weighed by this vector to provide much more accurate retrievals than traditional approaches without any weighting schemes. Compared to existing state-of-the-art approaches, the proposed algorithm outperforms them significantly.
dc.format.mediumborn digital
dc.format.mediumdoctoral dissertations
dc.identifierZhang_colostate_0053A_13441.pdf
dc.identifier.urihttp://hdl.handle.net/10217/173342
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjecthashing
dc.subjectnearest neighbor search
dc.subjectunsupervised
dc.subjectlarge-scale dataset
dc.subjectbinary code
dc.subjectquantization
dc.titleUnsupervised binary code learning for approximate nearest neighbor search in large-scale datasets
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_colostate_0053A_13441.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format