A geometric data analysis approach to dimension reduction in machine learning and data mining in medical and biological sensing
dc.contributor.author | Emerson, Tegan Halley, author | |
dc.contributor.author | Kirby, Michael, advisor | |
dc.contributor.author | Peterson, Chris, advisor | |
dc.contributor.author | Nyborg, Jennifer, committee member | |
dc.contributor.author | Chenney, Margaret, committee member | |
dc.date.accessioned | 2017-09-14T16:05:02Z | |
dc.date.available | 2017-09-14T16:05:02Z | |
dc.date.issued | 2017 | |
dc.description.abstract | Geometric data analysis seeks to uncover and leverage structure in data for tasks in machine learning when data is visualized as points in some dimensional, abstract space. This dissertation considers data which is high dimensional with respect to varied notions of dimension. Algorithms developed herein seek to reduce or estimate dimension while preserving the ability to perform a specific task in detection, identification, or classification. In some of the applications the only property considered important to be preserved under dimension reduction is the ability to perform the indicated machine learning task while in others strictly geometric relationships between data points are required to be preserved or minimized. First presented is the development of a numerical representation of images of rare circulating cells in immunofluorescent images. This representation is paired with a support vector machine and is able to identify differentiating cell structure between cell populations under consideration. Moreover, this differentiating information can be visualized through inversion of the representation and was found to be consistent with classification criterion used by clinically trained pathologists. Considered second is the task of identification and tracking of aerosolized bioagents via a multispectral lidar system. A nonnegative matrix factorization problem arised out of this data mining task which can be solved in several ways including a ℓ1-norm regularized, convex but nondifferentiable optimization problem. Exisiting methodologies achieve excellent results when internal matrix factor dimension is known but fail or can be computationally prohibitive when this dimension is not known. A modified optimization problem is proposed that may help reveal the appropriate internal factoring dimension based on the sparsity of averages of nonnegative values. Third, we present an algorithmic framework for reducing dimension in the linear mixing model. The mean-squared error of a statistical estimator of a component of the linear mixing model can be considered as a function of the rank of different estimating matrices. We seek to minimize mean squared error as a function of the rank of the appropriate estimating matrix and yield interesting order determination rules and improved results, relative to full rank counterparts, in applications in matched subspace detection and generalized modal analysis. Finally, the culminating work of this dissertation explores the existence of nearly isometric, dimension reducing mappings between special manifolds characterized by different dimensions. Understanding the analogous problem between Euclidean spaces provides insights into potential challenges and pitfalls one could encounter in proving the existence of such mappings. Most significant of the contributions is the statement and proof of a theorem establishing a connection between packing problems on Grassmannian manifolds and nearly isometric mappings between Grassmannians. The frameworks and algorithms constructed and developed in this doctoral research consider multiple manifestations of the notion of dimension. Across applications arising from varied areas of medical and biological sensing we have shown there to be great benefits to taking a geometric perspective on challenges in machine learning and data mining. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Emerson_colostate_0053A_14308.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/183941 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2000-2019 | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.subject | dimension reduction | |
dc.subject | Grassmannian manifold | |
dc.subject | data mining | |
dc.subject | machine learning | |
dc.subject | geometric data analysis | |
dc.title | A geometric data analysis approach to dimension reduction in machine learning and data mining in medical and biological sensing | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Mathematics | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Emerson_colostate_0053A_14308.pdf
- Size:
- 18.52 MB
- Format:
- Adobe Portable Document Format