Repository logo

Algorithms for feature selection and pattern recognition on Grassmann manifolds


This dissertation presents three distinct application-driven research projects united by ideas and topics from geometric data analysis, optimization, computational topology, and machine learning. We first consider hyperspectral band selection problem solved by using sparse support vector machines (SSVMs). A supervised embedded approach is proposed using the property of SSVMs to exhibit a model structure that includes a clearly identifiable gap between zero and non-zero feature vector weights that permits important bands to be definitively selected in conjunction with the classification problem. An SSVM is trained using bootstrap aggregating to obtain a sample of SSVM models to reduce variability in the band selection process. This preliminary sample approach for band selection is followed by a secondary band selection which involves retraining the SSVM to further reduce the set of bands retained. We propose and compare three adaptations of the SSVM band selection algorithm for the multiclass problem. We illustrate the performance of these methods on two benchmark hyperspectral data sets. Second, we propose an approach for capturing the signal variability in data using the framework of the Grassmann manifold (Grassmannian). Labeled points from each class are sampled and used to form abstract points on the Grassmannian. The resulting points have representations as orthonormal matrices and as such do not reside in Euclidean space in the usual sense. There are a variety of metrics which allow us to determine distance matrices that can be used to realize the Grassmannian as an embedding in Euclidean space. Multidimensional scaling (MDS) determines a low dimensional Euclidean embedding of the manifold, preserving or approximating the Grassmannian geometry based on the distance measure. We illustrate that we can achieve an isometric embedding of the Grassmann manifold using the chordal metric while this is not the case with other distances. However, non-isometric embeddings generated by using the smallest principal angle pseudometric on the Grassmannian lead to the best classification results: we observe that as the dimension of the Grassmannian grows, the accuracy of the classification grows to 100% in binary classification experiments. To build a classification model, we use SSVMs to perform simultaneous dimension selection. The resulting classifier selects a subset of dimensions of the embedding without loss in classification performance. Lastly, we present an application of persistent homology to the detection of chemical plumes in hyperspectral movies. The pixels of the raw hyperspectral data cubes are mapped to the geometric framework of the Grassmann manifold where they are analyzed, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows the time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmannian. This motivates the search for topological structure, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the manifold. The proposed framework affords the processing of large data sets, such as the hyperspectral movies explored in this investigation, while retaining valuable discriminative information. For a particular choice of a distance metric on the Grassmannian, it is possible to generate topological signals that capture changes in the scene after a chemical release.


Rights Access


Grassmann manifold
pattern recognition
sparse support vector machines
hyperspectral imagery
feature selection
persistent homology


Associated Publications