Evaluating cluster quality for visual data
dc.contributor.author | Wigness, Maggie, author | |
dc.contributor.author | Draper, Bruce, advisor | |
dc.contributor.author | Beveridge, Ross, committee member | |
dc.contributor.author | Howe, Adele, committee member | |
dc.contributor.author | Peterson, Chris, committee member | |
dc.date.accessioned | 2007-01-03T05:34:08Z | |
dc.date.available | 2007-01-03T05:34:08Z | |
dc.date.issued | 2013 | |
dc.description.abstract | Digital video cameras have made it easy to collect large amounts of unlabeled data that can be used to learn to recognize objects and actions. Collecting ground-truth labels for this data, however, is a much more time consuming task that requires human intervention. One approach to train on this data, while keeping the human workload to a minimum, is to cluster the unlabeled samples, evaluate the quality of the clusters, and then ask a human annotator to label only the clusters believed to be dominated by a single object/action class. This thesis addresses the task of evaluating the quality of unlabeled image clusters. We compare four cluster quality measures (and a baseline method) using real-world and synthetic data sets. Three of these measures can be found in the existing data mining literature: Dunn Index, Davies-Bouldin Index and Silhouette Width. We introduce a novel cluster quality measure as the fourth measure, derived from recent advances in approximate nearest neighbor algorithms from the computer vision literature, called Proximity Forest Connectivity (PFC). Experiments on real-world data show that no cluster quality measure performs "best" on all data sets; however, our novel PFC measure is always competitive and results in more top performances than any of the other measures. Results from synthetic data experiments show that while the data mining measures are susceptible to over-clustering typically required of visual data, PFC is much more robust. Further synthetic data experiments modeling features of visual data show that Davies-Bouldin is most robust to large amounts of class-specific noise. However, Davies-Bouldin, Silhouette and PFC all perform well in the presence of data with small amounts of class-specific noise, whereas Dunn struggles to perform better than random. | |
dc.format.medium | born digital | |
dc.format.medium | masters theses | |
dc.identifier | Wigness_colostate_0053N_11728.pdf | |
dc.identifier.uri | http://hdl.handle.net/10217/79204 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2000-2019 | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | cluster quality measures | |
dc.subject | image clustering | |
dc.subject | computer vision | |
dc.title | Evaluating cluster quality for visual data | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science (M.S.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Wigness_colostate_0053N_11728.pdf
- Size:
- 5.29 MB
- Format:
- Adobe Portable Document Format
- Description: