Browsing by Author "Adams, Henry, committee member"
Now showing 1 - 20 of 25
Results Per Page
Sort Options
Item Open Access A framework for resource efficient profiling of spatial model performance(Colorado State University. Libraries, 2022) Carlson, Caleb, author; Pallickara, Shrideep, advisor; Pallickara, Sangmi Lee, advisor; Adams, Henry, committee memberWe design models to understand phenomena, make predictions, and/or inform decision-making. This study targets models that encapsulate spatially evolving phenomena. Given a model M, our objective is to identify how well the model predicts across all geospatial extents. A modeler may expect these validations to occur at varying spatial resolutions (e.g., states, counties, towns, census tracts). Assessing a model with all available ground-truth data is infeasible due to the data volumes involved. We propose a framework to assess the performance of models at scale over diverse spatial data collections. Our methodology ensures orchestration of validation workloads while reducing memory strain, alleviating contention, enabling concurrency, and ensuring high throughput. We introduce the notion of a validation budget that represents an upper-bound on the total number of observations that are used to assess the performance of models across spatial extents. The validation budget attempts to capture the distribution characteristics of observations and is informed by multiple sampling strategies. Our design allows us to decouple the validation from the underlying model-fitting libraries to interoperate with models designed using different libraries and analytical engines; our advanced research prototype currently supports Scikit-learn, PyTorch, and TensorFlow. We have conducted extensive benchmarks that demonstrate the suitability of our methodology.Item Open Access A study of the feasibility of detecting primordial microscopic black hole remnants with the NOvA far detector(Colorado State University. Libraries, 2024) Wrobel, Megan, author; Buchanan, Norm, advisor; Berger, Josh, committee member; Adams, Henry, committee memberSeveral papers have argued that microscopic black holes may be stable against complete evaporation and may be a viable dark matter candidate [1–3]. This paper assesses the practicality of detecting these objects using long-baseline neutrino facilities, such as the NuMI Off-Axis νe Appearance (NOvA) experiment and the Deep Underground Neutrino Experiment (DUNE). The origin, stability, properties, and energy loss mechanism of such objects are examined. The signals produced from the detectors should allow for discrimination between these microscopic black holes and other particles traversing the detector. Potential challenges that could arise and next steps are also identified and considered.Item Open Access An adaptation of K-means-type algorithms to the Grassmann manifold(Colorado State University. Libraries, 2019) Stiverson, Shannon J., author; Kirby, Michael, advisor; Adams, Henry, committee member; Ben-Hur, Asa, committee memberThe Grassmann manifold provides a robust framework for analysis of high-dimensional data through the use of subspaces. Treating data as subspaces allows for separability between data classes that is not otherwise achieved in Euclidean space, particularly with the use of the smallest principal angle pseudometric. Clustering algorithms focus on identifying similarities within data and highlighting the underlying structure. To exploit the properties of the Grassmannian for unsupervised data analysis, two variations of the popular K-means algorithm are adapted to perform clustering directly on the manifold. We provide the theoretical foundations needed for computations on the Grassmann manifold and detailed derivations of the key equations. Both algorithms are then thoroughly tested on toy data and two benchmark data sets from machine learning: the MNIST handwritten digit database and the AVIRIS Indian Pines hyperspectral data. Performance of algorithms is tested on manifolds of varying dimension. Unsupervised classification results on the benchmark data are compared to those currently found in the literature.Item Open Access Applications of topological data analysis to natural language processing and computer vision(Colorado State University. Libraries, 2022) Garcia, Jason S., author; Krishnaswamy, Nikhil, advisor; Adams, Henry, committee member; Beveridge, Ross, committee memberTopological Data Analysis (TDA) uses ideas from topology to study the "shape" of data. It provides a set of tools to extract features, such as holes, voids, and connected components, from complex high-dimensional data. This thesis presents an introductory exposition of the mathematics underlying the two main tools of TDA: Persistent Homology and the MAPPER algorithm. Persistent Homology detects topological features that persist over a range of resolutions, capturing both local and global geometric information. The MAPPER algorithm is a visualization tool that provides a type of dimensional reduction that preserves topological properties of the data by projecting them onto lower dimensional simplicial complexes. Furthermore, this thesis explores recent applications of these tools to natural language processing and computer vision. These applications are divided into two main approaches: In the first approach, TDA is used to extract features from data that is then used as input for a variety of machine learning tasks, like image classification or visualizing the semantic structure of text documents. The second approach, applies the tools of TDA to the machine learning algorithms themselves. For example, using MAPPER to study how structure emerges in the weights of a trained neural network. Finally, the results of several experiments are presented. These include using Persistent Homology for image classification, and using MAPPER to visual the global structure of these data sets. Most notably, the MAPPER algorithm is used to visualize vector representations of contextualized word embeddings as they move through the encoding layers of the BERT-base transformer model.Item Open Access Asymptotic enumeration of matrix groups(Colorado State University. Libraries, 2018) Tyburski, Brady A., author; Wilson, James B., advisor; Adams, Henry, committee member; Pries, Rachel, committee member; Wilson, Jesse W., committee memberWe prove that the general linear group GLd(pe) has between pd4e/64-O(d2) and pd4e2·log2p distinct isomorphism types of subgroups. The upper bound is obtained by elementary counting methods, where as the lower bound is found by counting the number of isomorphism types of subgroups of the generalized Heisenberg group. To count these subgroups, we use nuclei of a bilinear map alongside versor products - a division analog of the tensor product.Item Open Access Combinatorial structures of hyperelliptic Hodge integrals(Colorado State University. Libraries, 2021) Afandi, Adam, author; Cavalieri, Renzo, advisor; Shoemaker, Mark, advisor; Adams, Henry, committee member; Prasad, Ashok, committee memberThis dissertation explores the combinatorial structures that underlie hyperelliptic Hodge integrals. In order to compute hyperelliptic Hodge integrals, we use Atiyah-Bott (torus) localization on a stack of stable maps to [P1/Z2] = P1 × BZ2. The dissertation culminates in two results: a closed-form expression for hyperelliptic Hodge integrals with one λ-class insertion, and a structure theorem (polynomiality) for Hodge integrals with an arbitrary number of λ-class insertions.Item Open Access Commutative algebra in the graded category with applications to equivariant cohomology rings(Colorado State University. Libraries, 2018) Blumstein, Mark, author; Duflot, Jeanne, advisor; Adams, Henry, committee member; Bacon, Joel, committee member; Shonkwiler, Clayton, committee memberTo view the abstract, please see the full text of the document.Item Open Access Convex and non-convex optimization using centroid-encoding for visualization, classification, and feature selection(Colorado State University. Libraries, 2022) Ghosh, Tomojit, author; Kirby, Michael, advisor; Anderson, Charles, committee member; Ben-Hur, Asa, committee member; Adams, Henry, committee memberClassification, visualization, and feature selection are the three essential tasks of machine learning. This Ph.D. dissertation presents convex and non-convex models suitable for these three tasks. We propose Centroid-Encoder (CE), an autoencoder-based supervised tool for visualizing complex and potentially large, e.g., SUSY with 5 million samples and high-dimensional datasets, e.g., GSE73072 clinical challenge data. Unlike an autoencoder, which maps a point to itself, a centroid-encoder has a modified target, i.e., the class centroid in the ambient space. We present a detailed comparative analysis of the method using various data sets and state-of-the-art techniques. We have proposed a variation of the centroid-encoder, Bottleneck Centroid-Encoder (BCE), where additional constraints are imposed at the bottleneck layer to improve generalization performance in the reduced space. We further developed a sparse optimization problem for the non-linear mapping of the centroid-encoder called Sparse Centroid-Encoder (SCE) to determine the set of discriminate features between two or more classes. The sparse model selects variables using the 1-norm applied to the input feature space. SCE extracts discriminative features from multi-modal data sets, i.e., data whose classes appear to have multiple clusters, by using several centers per class. This approach seems to have advantages over models which use a one-hot-encoding vector. We also provide a feature selection framework that first ranks each feature by its occurrence, and the optimal number of features is chosen using a validation set. CE and SCE are models based on neural network architectures and require the solution of non-convex optimization problems. Motivated by the CE algorithm, we have developed a convex optimization for the supervised dimensionality reduction technique called Centroid Component Retrieval (CCR). The CCR model optimizes a multi-objective cost by balancing two complementary terms. The first term pulls the samples of a class towards its centroid by minimizing a sample's distance from its class centroid in low dimensional space. The second term pushes the classes by maximizing the scattering volume of the ellipsoid formed by the class-centroids in embedded space. Although the design principle of CCR is similar to LDA, our experimental results show that CCR exhibits performance advantages over LDA, especially on high-dimensional data sets, e.g., Yale Faces, ORL, and COIL20. Finally, we present a linear formulation of Centroid-Encoder with orthogonality constraints, called Principal Centroid Component Analysis (PCCA). This formulation is similar to PCA, except the class labels are used to formulate the objective, resulting in the form of supervised PCA. We show the classification and visualization experiments results with this new linear tool.Item Open Access COVID-19 misinformation on Twitter: the role of deceptive support(Colorado State University. Libraries, 2022) Hashemi Chaleshtori, Fateme, author; Ray, Indrakshi, advisor; Anderson, Charles W., committee member; Malaiya, Yashwant K., committee member; Adams, Henry, committee memberSocial media platforms like Twitter are a major dissemination point for information and the COVID-19 pandemic is no exception. But not all of the information comes from reliable sources, which raises doubts about their validity. In social media posts, writers reference news articles to gain credibility by leveraging the trust readers have in reputable news outlets. However, there is not always a positive correlation between the cited article and the social media posting. Targeting the Twitter platform, this study presents a novel pipeline to determine whether a Tweet is indeed supported by the news article it refers to. The approach follows two general objectives: to develop a model capable of detecting Tweets containing claims that are worthy of fact-checking and then, to assess whether the claims made in a given Tweet are supported by the news article it cites. In the event that a Tweet is found to be trustworthy, we extract its claim via a sequence labeling approach. In doing so, we seek to reduce the noise and highlight the informative parts of a Tweet. Instead of detecting erroneous and invalid information by analyzing the propagation patterns or ensuing examination of Tweets against already proven statements, this study aims to identify reliable support (or lack thereof) before misinformation spreads. Our research reveals that 14.5% of the Tweets are not factual and therefore not worth checking. An effective filter like this is especially useful when looking at a platform such as Twitter, where hundreds of thousands of posts are created every day. Further, our analysis indicates that among the Tweets which refer to a news article as evidence of a factual claim, at least 1% of those Tweets are not substantiated by the article, and therefore mislead the reader.Item Open Access Determining synchronization of certain classes of primitive groups of affine type(Colorado State University. Libraries, 2022) Story, Dustin, author; Hulpke, Alexander, advisor; Adams, Henry, committee member; Buchanan, Norm, committee member; Gillespie, Maria, committee memberThe class of permutation groups includes 2-homogeneous groups, synchronizing groups, and primitive groups. Moreover, 2-homogeneous implies synchronizing, and synchronizing in turn implies primitivity. A complete classification of synchronizing groups remains an open problem. Our search takes place amongst the primitive groups, looking for examples of synchronizing and non-synchronizing. Using a case distinction from Aschbacher classes, our main results are constructive proofs showing that three classes of primitive affine groups are nonsynchronizing.Item Open Access Eventuality-based interval semantics and Free Logic: what if there, like, is no future, man?(Colorado State University. Libraries, 2019) Smith, Nathan L., author; Tucker, Dustin, advisor; Kasser, Jeff, committee member; Adams, Henry, committee memberFuture contingent propositions have famously been a source of trouble for philosophers and logicians committed to any variety of indeterminism on which facts about the future are not yet fixed. One possible answer to the problem involves presupposition—namely, that propositions lack truth-value when other propositions that they presuppose are false. This paper explores the plausibility of such an answer, beginning with a brief discussion of the problem of future contingent propositions and presupposition. From there, an in-depth discussion of Free Logic lays the groundwork of logical tools for the project, exploring the motivation for Free Logic's development and examples of Free Logic semantics. Subsequently, this paper discusses the history and usefulness of events-based semantics in analyzing English sentences. Using the tools of events-based semantics and formal logic, this paper formally models this approach to sentences in English by defining a semantics which can capture both tense and aspect of such sentences and which allows for truth-valueless future contingent propositions while preserving logical truths like the law of excluded middle.Item Open Access Generalizations of persistent homology(Colorado State University. Libraries, 2021) McCleary, Alexander J., author; Patel, Amit, advisor; Adams, Henry, committee member; Ben Hur, Asa, committee member; Peterson, Chris, committee memberPersistent homology typically starts with a filtered chain complex and produces an invariant called the persistence diagram. This invariant summarizes where holes are born and die in the filtration. In the traditional setting the filtered chain complex is a chain complex of vector spaces filtered over a totally ordered set. There are two natural directions to generalize the persistence diagram: we can consider filtrations of more general chain complexes and filtrations over more general partially ordered sets. In this dissertation we develop both of these generalizations by defining persistence diagrams for chain complexes in an essentially small abelian category filtered over any finite lattice.Item Open Access Hodge and Gelfand theory in Clifford analysis and tomography(Colorado State University. Libraries, 2022) Roberts, Colin, author; Shonkwiler, Clayton, advisor; Adams, Henry, committee member; Bangerth, Wolfgang, committee member; Roberts, Jacob, committee memberThere is an interesting inverse boundary value problem for Riemannian manifolds called the Calderón problem which asks if it is possible to determine a manifold and metric from the Dirichlet-to-Neumann (DN) operator. Work on this problem has been dominated by complex analysis and Hodge theory and Clifford analysis is a natural synthesis of the two. Clifford analysis analyzes multivector fields, their even-graded (spinor) components, and the vector-valued Hodge–Dirac operator whose square is the Laplace–Beltrami operator. Elements in the kernel of the Hodge–Dirac operator are called monogenic and since multivectors are multi-graded, we are able to capture the harmonic fields of Hodge theory and copies of complex holomorphic functions inside the space of monogenic fields simultaneously. We show that the space of multivector fields has a Hodge–Morrey-like decomposition into monogenic fields and the image of the Hodge–Dirac operator. Using the multivector formulation of electromagnetism, we generalize the electric and magnetic DN operators and find that they extract the absolute and relative cohomologies. Furthermore, those operators are the scalar components of the spinor DN operator whose kernel consists of the boundary traces of monogenic fields. We define a higher dimensional version of the Gelfand spectrum called the spinor spectrum which may be used in a higher dimensional version of the boundary control method. For compact regions of Euclidean space, the spinor spectrum is homeomorphic to the region itself. Lastly, we show that the monogenic fields form a sheaf that is locally homeomorphic to the underlying manifold which is a prime candidate for solving the Calderón problem using analytic continuation.Item Open Access Imprimitively generated designs(Colorado State University. Libraries, 2022) Lear, Aaron, author; Betten, Anton, advisor; Adams, Henry, committee member; Nielsen, Aaron, committee memberDesigns are a type of combinatorial object which uniformly cover all pairs in a base set V with subsets of V known as blocks. One important class of designs are those generated by a permutation group G acting on V and single initial block b subset of V. The most atomic examples of these designs would be generated by a primitive G. This thesis focuses on the less atomic case where G is imprimitive. Imprimitive permutation groups can be rearranged into a subset of easily understood groups which are derived from G and generate very symmetrical designs. This creates combinatorial restrictions on which group and block combinations can generate a design, turning a question about the existence of combinatorial objects into one more directly involving group theory. Specifically, the existence of imprimitively generated designs turns into a question about the existence of pair orbits of an appropriate size, for smaller permutation groups. This thesis introduces two restrictions on combinations of G and b which can generate designs, and discusses how they could be used to more efficiently enumerate imprimitively generated designs.Item Open Access Improved stick number upper bounds(Colorado State University. Libraries, 2019) Eddy, Thomas D., author; Shonkwiler, Clayton, advisor; Adams, Henry, committee member; Chitsaz, Hamid, committee memberA stick knot is a mathematical knot formed by a chain of straight line segments. For a knot K, define the stick number of K, denoted stick(K), to be the minimum number of straight edges necessary to form a stick knot which is equivalent to K. Stick number is a knot invariant whose precise value is unknown for the large majority of knots, although theoretical and observed bounds exist. There is a natural correspondence between stick knots and polygons in R3. Previous research has attempted to improve observed stick number upper bounds by computationally generating such polygons and identifying the knots that they form. This thesis presents a new variation on this method which generates equilateral polygons in tight confinement, thereby increasing the incidence of polygons forming complex knots. Our generation strategy is to sample from the space of confined polygons by leveraging the toric symplectic structure of this space. An efficient sampling algorithm based on this structure is described. This method was used to discover the precise stick number of knots 935, 939, 943, 945, and 948. In addition, the best-known stick number upper bounds were improved for 60 other knots with crossing number ten and below.Item Open Access Independence complexes of finite groups(Colorado State University. Libraries, 2021) Pinckney, Casey M., author; Hulpke, Alexander, advisor; Peterson, Chris, advisor; Adams, Henry, committee member; Neilson, James, committee memberUnderstanding generating sets for finite groups has been explored previously via the generating graph of a group, where vertices are group elements and edges are given by pairs of group elements that generate the group. We generalize this idea by considering minimal generating sets (with respect to inclusion) for subgroups of finite groups. These form a simplicial complex, which we call the independence complex. The vertices of the independence complex are nonidentity group elements and the faces of size k correspond to minimal generating sets of size k. We give a complete characterization via constructive algorithms, together with enumeration results, for the independence complexes of cyclic groups whose order is a squarefree product of primes, finite abelian groups whose order is a product of powers of distinct primes, and the nonabelian class of semidirect products Cp1p3…p2n-1 rtimes Cp2p4…p2n where p1,p2,…,p2n are distinct primes with p2i-1 > p2i for all 1 ≤ i ≤ n. In the latter case, we introduce a tool called a combinatorial diagram, which is a multipartite simplicial complex under certain numerical and minimal covering conditions. Combinatorial diagrams seem to be an interesting area of study on their own. We also include GAP and Polymake code which generates the facets of any (small enough) finite group, as well as visualize the independence complexes in small dimensions.Item Open Access k-simplex volume optimizing projection algorithms for high-dimensional data sets(Colorado State University. Libraries, 2021) Stiverson, Shannon J., author; Kirby, Michael, advisor; Peterson, Chris, advisor; Adams, Henry, committee member; Hess, Ann, committee memberMany applications produce data sets that contain hundreds or thousands of features, and consequently sit in very high dimensional space. It is desirable for purposes of analysis to reduce the dimension in a way that preserves certain important properties. Previous work has established conditions necessary for projecting data into lower dimensions while preserving pairwise distances up to some tolerance threshold, and algorithms have been developed to do so optimally. However, although similar criteria for projecting data into lower dimensions while preserving k-simplex volumes has been established, there are currently no algorithms seeking to optimally preserve such embedded volumes. In this work, two new algorithms are developed and tested: one which seeks to optimize the smallest projected k-simplex volume, and another which optimizes the average projected k-simplex volume.Item Open Access Laplacian Eigenmaps for time series analysis(Colorado State University. Libraries, 2020) Rosse, Patrick J., author; Kirby, Michael, advisor; Peterson, Chris, committee member; Adams, Henry, committee member; Anderson, Chuck, committee memberWith "Big Data" becoming more available in our day-to-day lives, it becomes necessary to make meaning of it. We seek to understand the structure of high-dimensional data that we are unable to easily plot. What shape is it? What points are "related" to each other? The primary goal is to simplify our understanding of the data both numerically and visually. First introduced by M. Belkin, and P. Niyogi in 2002, Laplacian Eigenmaps (LE) is a non-linear dimensional reduction tool that relies on the basic assumption that the raw data lies in a low-dimensional manifold in a high-dimensional space. Once constructed, the graph Laplacian is used to compute a low-dimensional representation of the data set that optimally preserves local neighborhood information. In this thesis, we present a detailed analysis of the method, the optimization problem it solves, and we put it to work on various time series data sets. We show that we are able to extract neighborhood features from a collection of time series, which allows us to cluster specific time series based on noticeable signatures within the raw data.Item Open Access Modeling the upper tail of the distribution of facial recognition non-match scores(Colorado State University. Libraries, 2016) Hunter, Brett D., author; Cooley, Dan, advisor; Givens, Geof, advisor; Kokoszka, Piotr, committee member; Fosdick, Bailey, committee member; Adams, Henry, committee memberIn facial recognition applications, the upper tail of the distribution of non-match scores is of interest because existing algorithms classify a pair of images as a match if their score exceeds some high quantile of the non-match distribution. I construct a general model for the distribution above the (1-τ)th quantile borrowing ideas from extreme value theory. The resulting distribution can be viewed as a reparameterized generalized Pareto distribution (GPD), but it differs from the traditional GPD in that τ is treated as fixed. Inference for both the (1-τ)th quantile uτ and the GPD scale and shape parameters is performed via M-estimation, where my objective function is a combination of the quantile regression loss function and reparameterized GPD densities. By parameterizing uτ and the GPD parameters in terms of available covariates, understanding of these covariates' influence on the tail of the distribution of non-match scores is attained. A simulation study shows that my method is able to estimate both the set of parameters describing the covariates' influence and high quantiles of the non-match distribution. The simulation study also shows that my model is competitive with quantile regression in estimating high quantiles and that it outperforms quantile regression for extremely high quantiles. I apply my method to a data set of non-match scores and find that covariates such as gender, use of glasses, and age difference have a strong influence on the tail of the non-match distribution.Item Open Access New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces(Colorado State University. Libraries, 2022) Fout, Alex M., author; Fosdick, Bailey, advisor; Kaplan, Andee, committee member; Cooley, Daniel, committee member; Adams, Henry, committee memberMany approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates.