Browsing by Author "Kirby, Michael, advisor"
Now showing 1 - 20 of 29
Results Per Page
Sort Options
Item Open Access A constrained optimization model for partitioning students into cooperative learning groups(Colorado State University. Libraries, 2016) Heine, Matthew Alan, author; Kirby, Michael, advisor; Pinaud, Olivier, committee member; Henry, Kimberly, committee memberThe problem of the constrained partitioning of a set using quantitative relationships amongst the elements is considered. An approach based on constrained integer programming is proposed that permits a group objective function to be optimized subject to group quality constraints. A motivation for this problem is the partitioning of students, e.g., in middle school, into groups that target educational objectives. The method is compared to another grouping algorithm in the literature on a data set collected in the Poudre School District.Item Open Access A geometric data analysis approach to dimension reduction in machine learning and data mining in medical and biological sensing(Colorado State University. Libraries, 2017) Emerson, Tegan Halley, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Nyborg, Jennifer, committee member; Chenney, Margaret, committee memberGeometric data analysis seeks to uncover and leverage structure in data for tasks in machine learning when data is visualized as points in some dimensional, abstract space. This dissertation considers data which is high dimensional with respect to varied notions of dimension. Algorithms developed herein seek to reduce or estimate dimension while preserving the ability to perform a specific task in detection, identification, or classification. In some of the applications the only property considered important to be preserved under dimension reduction is the ability to perform the indicated machine learning task while in others strictly geometric relationships between data points are required to be preserved or minimized. First presented is the development of a numerical representation of images of rare circulating cells in immunofluorescent images. This representation is paired with a support vector machine and is able to identify differentiating cell structure between cell populations under consideration. Moreover, this differentiating information can be visualized through inversion of the representation and was found to be consistent with classification criterion used by clinically trained pathologists. Considered second is the task of identification and tracking of aerosolized bioagents via a multispectral lidar system. A nonnegative matrix factorization problem arised out of this data mining task which can be solved in several ways including a ℓ1-norm regularized, convex but nondifferentiable optimization problem. Exisiting methodologies achieve excellent results when internal matrix factor dimension is known but fail or can be computationally prohibitive when this dimension is not known. A modified optimization problem is proposed that may help reveal the appropriate internal factoring dimension based on the sparsity of averages of nonnegative values. Third, we present an algorithmic framework for reducing dimension in the linear mixing model. The mean-squared error of a statistical estimator of a component of the linear mixing model can be considered as a function of the rank of different estimating matrices. We seek to minimize mean squared error as a function of the rank of the appropriate estimating matrix and yield interesting order determination rules and improved results, relative to full rank counterparts, in applications in matched subspace detection and generalized modal analysis. Finally, the culminating work of this dissertation explores the existence of nearly isometric, dimension reducing mappings between special manifolds characterized by different dimensions. Understanding the analogous problem between Euclidean spaces provides insights into potential challenges and pitfalls one could encounter in proving the existence of such mappings. Most significant of the contributions is the statement and proof of a theorem establishing a connection between packing problems on Grassmannian manifolds and nearly isometric mappings between Grassmannians. The frameworks and algorithms constructed and developed in this doctoral research consider multiple manifestations of the notion of dimension. Across applications arising from varied areas of medical and biological sensing we have shown there to be great benefits to taking a geometric perspective on challenges in machine learning and data mining.Item Open Access Algorithms for feature selection and pattern recognition on Grassmann manifolds(Colorado State University. Libraries, 2015) Chepushtanova, Sofya, author; Kirby, Michael, advisor; Peterson, Chris, committee member; Bates, Dan, committee member; Ben-Hur, Asa, committee memberThis dissertation presents three distinct application-driven research projects united by ideas and topics from geometric data analysis, optimization, computational topology, and machine learning. We first consider hyperspectral band selection problem solved by using sparse support vector machines (SSVMs). A supervised embedded approach is proposed using the property of SSVMs to exhibit a model structure that includes a clearly identifiable gap between zero and non-zero feature vector weights that permits important bands to be definitively selected in conjunction with the classification problem. An SSVM is trained using bootstrap aggregating to obtain a sample of SSVM models to reduce variability in the band selection process. This preliminary sample approach for band selection is followed by a secondary band selection which involves retraining the SSVM to further reduce the set of bands retained. We propose and compare three adaptations of the SSVM band selection algorithm for the multiclass problem. We illustrate the performance of these methods on two benchmark hyperspectral data sets. Second, we propose an approach for capturing the signal variability in data using the framework of the Grassmann manifold (Grassmannian). Labeled points from each class are sampled and used to form abstract points on the Grassmannian. The resulting points have representations as orthonormal matrices and as such do not reside in Euclidean space in the usual sense. There are a variety of metrics which allow us to determine distance matrices that can be used to realize the Grassmannian as an embedding in Euclidean space. Multidimensional scaling (MDS) determines a low dimensional Euclidean embedding of the manifold, preserving or approximating the Grassmannian geometry based on the distance measure. We illustrate that we can achieve an isometric embedding of the Grassmann manifold using the chordal metric while this is not the case with other distances. However, non-isometric embeddings generated by using the smallest principal angle pseudometric on the Grassmannian lead to the best classification results: we observe that as the dimension of the Grassmannian grows, the accuracy of the classification grows to 100% in binary classification experiments. To build a classification model, we use SSVMs to perform simultaneous dimension selection. The resulting classifier selects a subset of dimensions of the embedding without loss in classification performance. Lastly, we present an application of persistent homology to the detection of chemical plumes in hyperspectral movies. The pixels of the raw hyperspectral data cubes are mapped to the geometric framework of the Grassmann manifold where they are analyzed, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows the time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmannian. This motivates the search for topological structure, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the manifold. The proposed framework affords the processing of large data sets, such as the hyperspectral movies explored in this investigation, while retaining valuable discriminative information. For a particular choice of a distance metric on the Grassmannian, it is possible to generate topological signals that capture changes in the scene after a chemical release.Item Open Access An adaptation of K-means-type algorithms to the Grassmann manifold(Colorado State University. Libraries, 2019) Stiverson, Shannon J., author; Kirby, Michael, advisor; Adams, Henry, committee member; Ben-Hur, Asa, committee memberThe Grassmann manifold provides a robust framework for analysis of high-dimensional data through the use of subspaces. Treating data as subspaces allows for separability between data classes that is not otherwise achieved in Euclidean space, particularly with the use of the smallest principal angle pseudometric. Clustering algorithms focus on identifying similarities within data and highlighting the underlying structure. To exploit the properties of the Grassmannian for unsupervised data analysis, two variations of the popular K-means algorithm are adapted to perform clustering directly on the manifold. We provide the theoretical foundations needed for computations on the Grassmann manifold and detailed derivations of the key equations. Both algorithms are then thoroughly tested on toy data and two benchmark data sets from machine learning: the MNIST handwritten digit database and the AVIRIS Indian Pines hyperspectral data. Performance of algorithms is tested on manifolds of varying dimension. Unsupervised classification results on the benchmark data are compared to those currently found in the literature.Item Open Access Anomaly detection in terrestrial hyperspectral video using variants of the RX algorithm(Colorado State University. Libraries, 2012) Schwickerath, Anthony N., author; Kirby, Michael, advisor; Peterson, Christopher, committee member; Anderson, Charles, committee memberThere is currently interest in detecting the use of chemical and biological weapons using hyperspectral sensors. Much of the research in this area assumes the spectral signature of the weapon is known in advance. Unfortunately, this may not always be the case. To obviate the reliance on a library of known target signatures, we instead view this as an anomaly detection problem. In this thesis, the RX algorithm, a benchmark anomaly detection algorithm for multi- and hyper-spectral data is reviewed, as are some standard extensions. This class of likelihood ratio test-based algorithms is generally applied to aerial imagery for the identification of man-made artifacts. As such, the model assumes that the scale is relatively consistent and that the targets (roads, cars) also have fixed sizes. We apply these methods to terrestrial video of biological and chemical aerosol plumes, where the background scale and target size both vary, and compare preliminary results. To explore the impact of parameter choice on algorithm performance, we also present an empirical study of the standard RX algorithm applied to synthetic targets of varying sizes over a range of settings.Item Open Access Automated detection of circulating cells using low level features(Colorado State University. Libraries, 2013) Emerson, Tegan Halley, author; Kirby, Michael, advisor; Peterson, Chris, committee member; Nyborg, Jennifer, committee memberThis thesis addresses the problem of detection of high definition circulating tumor cells using data driven feature selection. We propose techniques in pattern analysis and computer vision to achieve this goal. Specifically, we determine a set of low level features which can structurally differentiate between different cell types of interest to contribute to the treatment and monitoring of patients. We have implemented three image representation techniques on a curated data set. The curated data set consists of digitized images of 1000 single cells: 500 of which are high definition circulating tumor cells or other cells of high interest, and 500 of which are white blood cells. None of the three image representation techniques have been previously applied to this data set. One image representation is a novel contribution and is based on the characterization of a cell in terms of its concentric Fourier rings. The Fourier Ring Descriptors (FRDs) exploit the size variations and morphological differences between events of high and low interest while being rotationally invariant. Using the low level descriptors, FRDs, as a representation with a linear support vector machine decision tree classifier we have been able to average 99.34% accuracy on the curated data set and 99.53% on non-curated data. FRDs exhibit robustness to rotation and segmentation error. We discuss the applications of the results to clinical use in context of data provided by The Kuhn Laboratory at The Scripps Research Institute.Item Open Access Classification on the Grassmannians: theory and applications(Colorado State University. Libraries, 2008) Chang, Jen-Mei, author; Kirby, Michael, advisorThis dissertation consists of four parts. It introduces a novel geometric framework for the general classification problem and presents empirical results obtained from applying the proposed method on some popular classification problems. An analysis of the robustness of the method is provided using matrix perturbation theory, which in turn motivates an optimization problem to improve the robustness of the classifier. Lastly, we illustrate the use of compressed data representations based on Karcher mean.Item Open Access Comparing sets of data sets on the Grassmann and flag manifolds with applications to data analysis in high and low dimensions(Colorado State University. Libraries, 2020) Ma, Xiaofeng, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Chong, Edwin, committee member; Scharf, Louis, committee member; Shonkwiler, Clayton, committee memberThis dissertation develops numerical algorithms for comparing sets of data sets utilizing shape and orientation of data clouds. Two key components for "comparing" are the distance measure between data sets and correspondingly the geodesic path in between. Both components will play a core role which connects two parts of this dissertation, namely data analysis on the Grassmann manifold and flag manifold. For the first part, we build on the well known geometric framework for analyzing and optimizing over data on the Grassmann manifold. To be specific, we extend the classical self-organizing mappings to the Grassamann manifold to visualize sets of high dimensional data sets in 2D space. We also propose an optimization problem on the Grassmannian to recover missing data. In the second part, we extend the geometric framework to the flag manifold to encode the variability of nested subspaces. There we propose a numerical algorithm for computing a geodesic path and distance between nested subspaces. We also prove theorems to show how to reduce the dimension of the algorithm for practical computations. The approach is shown to have advantages for analyzing data when the number of data points is larger than the number of features.Item Open Access Computer vision approach to classification of circulating tumor cells(Colorado State University. Libraries, 2013) Hopkins, David, author; Kirby, Michael, advisor; Peterson, Chris, committee member; Givens, Geof, committee memberCurrent research into the detection and characterization of circulating tumor cells (CTCs) in the bloodstream can be used to assess the threat to a potential cancer victim. We have determined specific goals to further the understanding of these cells. 1) Full automation of an algorithm to overcome the current methods challenges of being labor-intensive and time-consuming, 2) Detection of single CTC cells amongst several million white blood cells given digital imagery of a panel of blood, and 3) Objective classification of white blood cells, CTCs, and potential sub-types. We demonstrate in this paper the developed theory, code and implementation necessary for addressing these goals using mathematics and computer vision techniques. These include: 1) Formation of a completely data-driven methodology, and 2) Use of Bag of Features computer vision technique coupled with custom-built pixel-centric feature descriptors, 3) Use of clustering techniques such as K-means and Hierarchical clustering as a robust classification method to glean insights into cell characteristics. To objectively determine the adequacy of our approach, we test our algorithm against three benchmarks: sensitivity/specificity in classification, nontrivial event detection, and rotational invariance. The algorithm performed well with the first two, and we provide possible modifications to improve performance on the third. The results of the sensitivity and specificity benchmark are important. The unfiltered data we used to test our algorithm were images of blood panels containing 44,914 WBCs and 39 CTCs. The algorithm classified 67.5 percent of CTCs into an outlier cluster containing only 300 cells. A simple modification brought the classification rate up to 80 percent of total CTCs. This modification brings the cluster count to only 400 cells. This is a significant reduction in cells a pathologist would sort through as it is only .9 percent of the total data.Item Open Access Convex and non-convex optimization using centroid-encoding for visualization, classification, and feature selection(Colorado State University. Libraries, 2022) Ghosh, Tomojit, author; Kirby, Michael, advisor; Anderson, Charles, committee member; Ben-Hur, Asa, committee member; Adams, Henry, committee memberClassification, visualization, and feature selection are the three essential tasks of machine learning. This Ph.D. dissertation presents convex and non-convex models suitable for these three tasks. We propose Centroid-Encoder (CE), an autoencoder-based supervised tool for visualizing complex and potentially large, e.g., SUSY with 5 million samples and high-dimensional datasets, e.g., GSE73072 clinical challenge data. Unlike an autoencoder, which maps a point to itself, a centroid-encoder has a modified target, i.e., the class centroid in the ambient space. We present a detailed comparative analysis of the method using various data sets and state-of-the-art techniques. We have proposed a variation of the centroid-encoder, Bottleneck Centroid-Encoder (BCE), where additional constraints are imposed at the bottleneck layer to improve generalization performance in the reduced space. We further developed a sparse optimization problem for the non-linear mapping of the centroid-encoder called Sparse Centroid-Encoder (SCE) to determine the set of discriminate features between two or more classes. The sparse model selects variables using the 1-norm applied to the input feature space. SCE extracts discriminative features from multi-modal data sets, i.e., data whose classes appear to have multiple clusters, by using several centers per class. This approach seems to have advantages over models which use a one-hot-encoding vector. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. CE and SCE are models based on neural network architectures and require the solution of non-convex optimization problems. Motivated by the CE algorithm, we have developed a convex optimization for the supervised dimensionality reduction technique called Centroid Component Retrieval (CCR). The CCR model optimizes a multi-objective cost by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class centroid in low dimensional space. The second term pushes the classes by maximizing the scattering volume of the ellipsoid formed by the class-centroids in embedded space. Although the design principle of CCR is similar to LDA, our experimental results show that CCR exhibits performance advantages over LDA, especially on high-dimensional data sets, e.g., Yale Faces, ORL, and COIL20. Finally, we present a linear formulation of Centroid-Encoder with orthogonality constraints, called Principal Centroid Component Analysis (PCCA). This formulation is similar to PCA, except the class labels are used to formulate the objective, resulting in the form of supervised PCA. We show the classification and visualization experiments results with this new linear tool.Item Open Access Covariance integral invariants of embedded Riemannian manifolds for manifold learning(Colorado State University. Libraries, 2018) Álvarez Vizoso, Javier, author; Peterson, Christopher, advisor; Kirby, Michael, advisor; Bates, Dan, committee member; Cavalieri, Renzo, committee member; Eykholt, Richard, committee memberThis thesis develops an effective theoretical foundation for the integral invariant approach to study submanifold geometry via the statistics of the underlying point-set, i.e., Manifold Learning from covariance analysis. We perform Principal Component Analysis over a domain determined by the intersection of an embedded Riemannian manifold with spheres or cylinders of varying scale in ambient space, in order to generalize to arbitrary dimension the relationship between curvature and the eigenvalue decomposition of covariance matrices. In the case of regular curves in general dimension, the covariance eigenvectors converge to the Frenet-Serret frame and the corresponding eigenvalues have ratios that asymptotically determine the generalized curvatures completely, up to a constant that we determine by proving a recursion relation for a certain sequence of Hankel determinants. For hypersurfaces, the eigenvalue decomposition has series expansion given in terms of the dimension and the principal curvatures, where the eigenvectors converge to the Darboux frame of principal and normal directions. In the most general case of embedded Riemannian manifolds, the eigenvalues and limit eigenvectors of the covariance matrices are found to have asymptotic behavior given in terms of the curvature information encoded by the third fundamental form of the manifold, a classical tensor that we generalize to arbitrary dimension, and which is related to the Weingarten map and Ricci operator. These results provide descriptors at scale for the principal curvatures and, in turn, for the second fundamental form and the Riemann curvature tensor of a submanifold, which can serve to perform multi-scale Geometry Processing and Manifold Learning, making use of the advantages of the integral invariant viewpoint when only a discrete sample of points is available.Item Open Access Cracking open the black box: a geometric and topological analysis of neural networks(Colorado State University. Libraries, 2024) Cole, Christina, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Cheney, Margaret, committee member; Draper, Bruce, committee memberDeep learning is a subfield of machine learning that has exploded in recent years in terms of publications and commercial consumption. Despite their increasing prevalence in performing high-risk tasks, deep learning algorithms have outpaced our understanding of them. In this work, we hone in on neural networks, the backbone of deep learning, and reduce them to their scaffolding defined by polyhedral decompositions. With these decompositions explicitly defined for low-dimensional examples, we utilize novel visualization techniques to build a geometric and topological understanding of them. From there, we develop methods of implicitly accessing neural networks' polyhedral skeletons, which provide substantial computational and memory savings compared to those requiring explicit access. While much of the related work using neural network polyhedral decompositions is limited to toy models and datasets, the savings provided by our method allow us to use state-of-the-art neural networks and datasets in our analyses. Our experiments alone demonstrate the viability of a polyhedral view of neural networks and our results show its usefulness. More specifically, we show that the geometry that a polyhedral decomposition imposes on its neural network's domain contains signals that distinguish between original and adversarial images. We conclude our work with suggested future directions. Therefore, we (1) contribute toward closing the gap between our use of neural networks and our understanding of them through geometric and topological analyses and (2) outline avenues for extensions upon this work.Item Open Access Exploiting geometry, topology, and optimization for knowledge discovery in big data(Colorado State University. Libraries, 2013) Ziegelmeier, Lori Beth, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Liu, Jiangguo (James), committee member; Draper, Bruce, committee memberIn this dissertation, we consider several topics that are united by the theme of topological and geometric data analysis. First, we consider an application in landscape ecology using a well-known vector quantization algorithm to characterize and segment the color content of natural imagery. Color information in an image may be viewed naturally as clusters of pixels with similar attributes. The inherent structure and distribution of these clusters serves to quantize the information in the image and provides a basis for classification. A friendly graphical user interface called Biological Landscape Organizer and Semi-supervised Segmenting Machine (BLOSSM) was developed to aid in this classification. We consider four different choices for color space and five different metrics in which to analyze our data, and results are compared. Second, we present a novel topologically driven clustering algorithm that blends Locally Linear Embedding (LLE) and vector quantization by mapping color information to a lower dimensional space, identifying distinct color regions, and classifying pixels together based on both a proximity measure and color content. It is observed that these techniques permit a significant reduction in color resolution while maintaining the visually important features of images. Third, we develop a novel algorithm which we call Sparse LLE that leads to sparse representations in local reconstructions by using a data weighted 1-norm regularization term in the objective function of an optimization problem. It is observed that this new formulation has proven effective at automatically determining an appropriate number of nearest neighbors for each data point. We explore various optimization techniques, namely Primal Dual Interior Point algorithms, to solve this problem, comparing the computational complexity for each. Fourth, we present a novel algorithm that can be used to determine the boundary of a data set, or the vertices of a convex hull encasing a point cloud of data, in any dimension by solving a quadratic optimization problem. In this problem, each point is written as a linear combination of its nearest neighbors where the coefficients of this linear combination are penalized if they do not construct a convex combination, revealing those points that cannot be represented in this way, the vertices of the convex hull containing the data. Finally, we exploit the relatively new tool from topological data analysis, persistent homology, and consider the use of vector bundles to re-embed data in order to improve the topological signal of a data set by embedding points sampled from a projective variety into successive Grassmannians.Item Open Access Grassmann, Flag, and Schubert varieties in applications(Colorado State University. Libraries, 2017) Marrinan, Timothy P., author; Kirby, Michael, advisor; Peterson, Chris, advisor; Azimi-Sadjadi, Mahmood R., committee member; Bates, Dan, committee member; Draper, Bruce, committee memberThis dissertation develops mathematical tools for signal processing and pattern recognition tasks where data with the same identity is assumed to vary linearly. We build on the growing canon of techniques for analyzing and optimizing over data on Grassmann manifolds. Specifically we expand on a recently developed method referred to as the flag mean that finds an average representation for a collection data that consists of linear subspaces of possibly different dimensions. When prior knowledge exists about relationships between these data, we show that a point analogous to the flag mean can be found as an element of a Schubert variety to incorporates this theoretical information. This domain restriction relates closely to a recent result regarding point-to-set functions. This restricted average along with a property of the flag mean that prioritizes weak but common information, leads to practical applications of the flag mean such as chemical plume detection in long-wave infrared hyperspectral videos, and a modification of the well-known diffusion map for adaptively visualizing data relationships in 2-dimensions.Item Open Access k-simplex volume optimizing projection algorithms for high-dimensional data sets(Colorado State University. Libraries, 2021) Stiverson, Shannon J., author; Kirby, Michael, advisor; Peterson, Chris, advisor; Adams, Henry, committee member; Hess, Ann, committee memberMany applications produce data sets that contain hundreds or thousands of features, and consequently sit in very high dimensional space. It is desirable for purposes of analysis to reduce the dimension in a way that preserves certain important properties. Previous work has established conditions necessary for projecting data into lower dimensions while preserving pairwise distances up to some tolerance threshold, and algorithms have been developed to do so optimally. However, although similar criteria for projecting data into lower dimensions while preserving k-simplex volumes has been established, there are currently no algorithms seeking to optimally preserve such embedded volumes. In this work, two new algorithms are developed and tested: one which seeks to optimize the smallest projected k-simplex volume, and another which optimizes the average projected k-simplex volume.Item Open Access Laplacian Eigenmaps for time series analysis(Colorado State University. Libraries, 2020) Rosse, Patrick J., author; Kirby, Michael, advisor; Peterson, Chris, committee member; Adams, Henry, committee member; Anderson, Chuck, committee memberWith "Big Data" becoming more available in our day-to-day lives, it becomes necessary to make meaning of it. We seek to understand the structure of high-dimensional data that we are unable to easily plot. What shape is it? What points are "related" to each other? The primary goal is to simplify our understanding of the data both numerically and visually. First introduced by M. Belkin, and P. Niyogi in 2002, Laplacian Eigenmaps (LE) is a non-linear dimensional reduction tool that relies on the basic assumption that the raw data lies in a low-dimensional manifold in a high-dimensional space. Once constructed, the graph Laplacian is used to compute a low-dimensional representation of the data set that optimally preserves local neighborhood information. In this thesis, we present a detailed analysis of the method, the optimization problem it solves, and we put it to work on various time series data sets. We show that we are able to extract neighborhood features from a collection of time series, which allows us to cluster specific time series based on noticeable signatures within the raw data.Item Open Access Linear models, signal detection, and the Grassmann manifold(Colorado State University. Libraries, 2014) Schwickerath, Anthony Norbert, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Scharf, Louis, committee member; Eykholt, Richard, committee memberStandard approaches to linear signal detection, reconstruction, and model identification problems, such as matched subspace detectors (MF, MDD, MSD, and ACE) and anomaly detectors (RX) are derived in the ambient measurement space using statistical methods (GLRT, regression). While the motivating arguments are statistical in nature, geometric interpretations of the test statistics are sometimes developed after the fact. Given a standard linear model, many of these statistics are invariant under orthogonal transformations, have a constant false alarm rate (CFAR), and some are uniformly most powerful invariant (UMPI). These properties combined with the simplicity of the tests have led to their widespread use. In this dissertation, we present a framework for applying real-valued functions on the Grassmann manifold in the context of these same signal processing problems. Specifically, we consider linear subspace models which, given assumptions on the broadband noise, correspond to Schubert varieties on the Grassmann manifold. Beginning with increasing (decreasing) or Schur-convex (-concave) functions of principal angles between pairs of points, of which the geodesic and chordal distances (or probability distribution functions) are examples, we derive the associated point-to-Schubert variety functions and present signal detection and reconstruction algorithms based upon this framework. As a demonstration of the framework in action, we implement an end-to-end system utilizing our framework and algorithms. We present results of this system processing real hyperspectral images.Item Open Access Low rank representations of matrices using nuclear norm heuristics(Colorado State University. Libraries, 2014) Osnaga, Silvia Monica, author; Kirby, Michael, advisor; Peterson, Chris, advisor; Bates, Dan, committee member; Wang, Haonan, committee memberThe pursuit of low dimensional structure from high dimensional data leads in many instances to the finding the lowest rank matrix among a parameterized family of matrices. In its most general setting, this problem is NP-hard. Different heuristics have been introduced for approaching the problem. Among them is the nuclear norm heuristic for rank minimization. One aspect of this thesis is the application of the nuclear norm heuristic to the Euclidean distance matrix completion problem. As a special case, the approach is applied to the graph embedding problem. More generally, semi-definite programming, convex optimization, and the nuclear norm heuristic are applied to the graph embedding problem in order to extract invariants such as the chromatic number, Rn embeddability, and Borsuk-embeddability. In addition, we apply related techniques to decompose a matrix into components which simultaneously minimize a linear combination of the nuclear norm and the spectral norm. In the case when the Euclidean distance matrix is the distance matrix for a complete k-partite graph it is shown that the nuclear norm of the associated positive semidefinite matrix can be evaluated in terms of the second elementary symmetric polynomial evaluated at the partition. We prove that for k-partite graphs the maximum value of the nuclear norm of the associated positive semidefinite matrix is attained in the situation when we have equal number of vertices in each set of the partition. We use this result to determine a lower bound on the chromatic number of the graph. Finally, we describe a convex optimization approach to decomposition of a matrix into two components using the nuclear norm and spectral norm.Item Open Access Mean variants on matrix manifolds(Colorado State University. Libraries, 2012) Marks, Justin D., author; Peterson, Chris, advisor; Kirby, Michael, advisor; Bates, Dan, committee member; Anderson, Chuck, committee memberThe geometrically elegant Stiefel and Grassmann manifolds have become organizational tools for data applications, such as illumination spaces for faces in digital photography. Modern data analysis involves increasingly large-scale data sets, both in terms of number of samples and number of features per sample. In circumstances such as when large-scale data has been mapped to a Stiefel or Grassmann manifold, the computation of mean representatives for clusters of points on these manifolds is a valuable tool. We derive three algorithms for determining mean representatives for a cluster of points on the Stiefel manifold and the Grassmann manifold. Two algorithms, the normal mean and the projection mean, follow the theme of the Karcher mean, relying upon inversely related maps that operate between the manifold and the tangent bundle. These maps are informed by the geometric definition of the tangent bundle and the normal bundle. From the cluster of points, each algorithm exploits these maps in a predictor/corrector loop until converging, with prescribed tolerance, to a fixed point. The fixed point acts as the normal mean representative, or projection mean representative, respectively, of the cluster. This method shares its principal structural characteristics with the Karcher mean, but utilizes a distinct pair of inversely related maps. The third algorithm, called the flag mean, operates in a context comparable to a generalized Grassmannian. It produces a mean subspace of arbitrary dimension. We provide applications and discuss generalizations of these means to other manifolds.Item Open Access Methods for network generation and spectral feature selection: especially on gene expression data(Colorado State University. Libraries, 2019) Mankovich, Nathan, author; Kirby, Michael, advisor; Anderson, Charles, committee member; Peterson, Chris, committee memberFeature selection is an essential step in many data analysis pipelines due to its ability to remove unimportant data. We will describe how to realize a data set as a network using correlation, partial correlation, heat kernel and random edge generation methods. Then we lay out how to select features from these networks mainly leveraging the spectrum of the graph Laplacian, adjacency, and supra-adjacency matrices. We frame this work in the context of gene co-expression network analysis and proceed with a brief analysis of a small set of gene expression data for human subjects infected with the flu virus. We are able to distinguish two sets of 14-15 genes which produce two fold SSVM classification accuracies at certain times that are at least as high as classification accuracies done with more than 12,000 genes.