Browsing by Author "Kaplan, Andee, committee member"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Advances in Bayesian spatial statistics for ecology and environmental science(Colorado State University. Libraries, 2024) Wright, Wilson J., author; Hooten, Mevin B., advisor; Cooley, Daniel S., advisor; Keller, Kayleigh P., committee member; Kaplan, Andee, committee member; Ross, Matthew R. V., committee memberIn this dissertation, I develop new Bayesian methods for analyzing spatial data from applications in ecology and environmental science. In particular, I focus on methods for mechanistic spatial models and binary spatial processes. I first consider the distribution of heavy metal pollution from a mining road in Cape Krusenstern, Alaska, USA. I develop a mechanistic spatial model that uses the physical process of atmospheric dispersion to characterize the spatial structure in these data. This approach directly incorporates scientific knowledge about how pollutants spread and provides inferences about this process. To assess how the heavy metal pollution impacts the vegetation community in Cape Krusenstern, I also develop a new model that represents plant cover for multiple species using clipped Gaussian processes. This approach is applicable to multiscale and multivariate binary processes that are observed at point locations — including multispecies plant cover data collected using the point intercept method. By directly analyzing the point-level data, instead of aggregating observations to the plot-level, this model allows for inferences about both large-scale and small-scale spatial dependence in plant cover. Additionally, it also incorporates dependence among different species at the small spatial scale. The third model I develop is motivated by ecological studies of wildlife occupancy. Similar to plant cover, species occurrence can be modeled as a binary spatial process. However, occupancy data are inherently measured at areal survey units. I develop a continuous-space occupancy model that accounts for the change of spatial support between the occurrence process and the observed data. All of these models are implemented using Bayesian methods and I present computationally efficient methods for fitting them. This includes a new surrogate data slice sampler for implementing models with latent nearest neighbor Gaussian processes.Item Open Access Bayesian methods for spatio-temporal ecological processes using imagery data(Colorado State University. Libraries, 2021) Lu, Xinyi, author; Hooten, Mevin, advisor; Kaplan, Andee, committee member; Fosdick, Bailey, committee member; Koons, David, committee memberIn this dissertation, I present novel Bayesian hierarchical models to statistically characterize spatio-temporal ecological processes. I am motivated by the volatility of Alaskan ecosystems in the face of global climate change and I demonstrate methods for emerging imagery data as survey technologies advance. For the nearshore marine ecosystem, I developed a model that combines ecological diffusion and logistic growth to quantify colonization dynamics of a population that establishes long-term equilibrium over a heterogeneous environment. I also unified modeling concepts from entity resolution and capture-recapture to identify unique individuals of the population from overlapping images and infer total abundance. For the terrestrial ecosystem, I developed a stochastic state-space model to quantify the impact of climate change on the structural transformation of land cover types. The methods presented in this dissertation provide interpretable inference and employ statistical computing strategies to achieve scalability.Item Open Access Causality and clustering in complex settings(Colorado State University. Libraries, 2023) Gibbs, Connor P., author; Keller, Kayleigh, advisor; Fosdick, Bailey, advisor; Koslovsky, Matthew, committee member; Kaplan, Andee, committee member; Anderson, Brooke, committee memberCausality and clustering are at the forefront of many problems in statistics. In this dissertation, we present new methods and approaches for drawing causal inference with temporally dependent units and clustering nodes in heterogeneous networks. To begin, we investigate the causal effect of a timeout at stopping an opposing team's run in the National Basketball Association (NBA). After formalizing the notion of a run in the NBA and in light of the temporal dependence among runs, we define the units under study with careful consideration of the stable unit-treatment-value assumption pertinent to the Rubin causal model. After introducing a novel, interpretable outcome based on the score difference, we conclude that while comebacks frequently occur after a run, it is slightly disadvantageous to call a timeout during a run by the opposing team. Further, we demonstrate that the magnitude of this effect varies by franchise, lending clarity to an oft-debated topic among sports' fans. Following, we represent the known relationships among and between genetic variants and phenotypic abnormalities as a heterogeneous network and introduce a novel analytic pipeline to identify clusters containing undiscovered gene to phenotype relations (ICCUR) from the network. ICCUR identifies, scores, and ranks small heterogeneous clusters according to their potential for future discovery in a large temporal biological network. We train an ensemble model of boosted regression trees to predict clusters' potential for future discovery using observable cluster features, and show the resulting clusters contain significantly more undiscovered gene to phenotype relations than expected by chance. To demonstrate its use as a diagnostic aid, we apply the results of the ICCUR pipeline to real, undiagnosed patients with rare diseases, identifying clusters containing patients' co-occurring yet otherwise unconnected genotypic and phenotypic information, some connections which have since been validated by human curation. Motivated by ICCUR and its application, we introduce a novel method called ECoHeN (pronounced "eco-hen") to extract communities from heterogeneous networks in a statistically meaningful way. Using a heterogeneous configuration model as a reference distribution, ECoHeN identifies communities that are significantly more densely connected than expected given the node types and connectivity of its membership without imposing constraints on the type composition of the extracted communities. The ECoHeN algorithm identifies communities one at a time through a dynamic set of iterative updating rules and is guaranteed to converge. To our knowledge this is the first discovery method that distinguishes and identifies both homogeneous and heterogeneous, possibly overlapping, community structure in a network. We demonstrate the performance of ECoHeN through simulation and in application to a political blogs network to identify collections of blogs which reference one another more than expected considering the ideology of its' members. Along with small partisan communities, we demonstrate ECoHeN's ability to identify a large, bipartisan community undetectable by canonical community detection methods and denser than modern, competing methods.Item Open Access Integrated statistical models in ecology(Colorado State University. Libraries, 2023) Van Ee, Justin, author; Hooten, Mevin, advisor; Koslovsky, Matthew, advisor; Keller, Kayleigh, committee member; Kaplan, Andee, committee member; Bailey, Larissa, committee memberThe number of endangered and vulnerable species continues to grow globally as a result of habitat destruction, overharvesting, invasive species, and climate change. Understanding the drivers of population decline is pivotal for informing species conservation. Many datasets collected are restricted to a limited portion of the species range, may not include observations of other organisms in the community, or lack temporal breadth. When analyzed independently, these datasets often overlook drivers of population decline, muddle community responses to ecological threats, and poorly predict population trajectories. Over the last decade, thanks to efforts like The Long Term Ecological Research Network and National Ecological Observatory Network, citizen science surveys, and technological advances, ecological datasets that provide insights about collections of organisms or multiple characteristics of the same organism have become prevalent. The conglomerate of datasets has the potential to provide novel insights, improve predictive performance, and disentangle the contributions of confounded factors, but specifying joint models that assimilate all the available data sources is both intellectually daunting and computationally prohibitive. I develop methodology for specifying computationally efficient integrated models. I discuss datasets frequently collected in ecology, objectives common to many analyses, and the methodological challenges associated with specifying joint models in these contexts. I introduce a suite of model building and computational techniques I used to facilitate inference in three applied analyses of ecological data. In a case study of the joint mammalian response to the bark beetle epidemic in Colorado, I describe a restricted regression approach to deconfounding the effects of environmental factors and community structure on species distributions. I highlight that fitting certain joint species distribution models in a restricted parameterization improves sampling efficiency. To improve abundance estimates for a federally protected species, I specify an integrated model for analyzing independent aerial and ground surveys. I use a Markov melding approach to facilitate posterior inference and construct the joint distribution implied by the prior information, assumptions, and data expressed across a chain of submodels. I extend the integrated model by assimilating additional demographic surveys of the species that allow abundance estimates to be linked to annual variability in population vital rates. To reduce computation time, both models are fit using a multi-stage Markov chain Monte Carlo algorithm with parallelization. In each applied analysis, I uncover associations that would have been overlooked had the datasets been analyzed independently and improve predictive performance relative to models fit to individual datasets.Item Open Access New methods for fixed-margin binary matrix sampling, Fréchet covariance, and MANOVA tests for random objects in multiple metric spaces(Colorado State University. Libraries, 2022) Fout, Alex M., author; Fosdick, Bailey, advisor; Kaplan, Andee, committee member; Cooley, Daniel, committee member; Adams, Henry, committee memberMany approaches to the analysis of network data essentially view the data as Euclidean and apply standard multivariate techniques. In this dissertation, we refrain from this approach, exploring two alternate approaches to the analysis of networks and other structured data. The first approach seeks to determine how unique an observed simple, directed network is by comparing it to like networks which share its degree distribution. Generating networks for comparison requires sampling from the space of all binary matrices with the prescribed row and column margins, since enumeration of all such matrices is often infeasible for even moderately sized networks with 20-50 nodes. We propose two new sampling methods for this problem. First, we extend two Markov chain Monte Carlo methods to sample from the space non-uniformly, allowing flexibility in the case that some networks are more likely than others. We show that non-uniform sampling could impede the MCMC process, but in certain special cases is still valid. Critically, we illustrate the differential conclusions that could be drawn from uniform vs. nonuniform sampling. Second, we develop a generalized divide and conquer approach which recursively divides matrices into smaller subproblems which are much easier to count and sample. Each division step reveals interesting mathematics involving the enumeration of integer partitions and points in convex lattice polytopes. The second broad approach we explore is comparing random objects in metric spaces lacking a coordinate system. Traditional definitions of the mean and variance no longer apply, and standard statistical tests have needed reconceptualization in terms of only distances in the metric space. We consider the multivariate setting where random objects exist in multiple metric spaces, which can be thought of as distinct views of the random object. We define the notion of Fréchet covariance to measure dependence between two metric spaces, and establish consistency for the sample estimator. We then propose several tests for differences in means and covariance matrices among two or more groups in multiple metric spaces, and compare their performance on scenarios involving random probability distributions and networks with node covariates.