Department of Statistics

Permanent URI for this communityhttps://hdl.handle.net/10217/100518

These digital collections include theses, dissertations, and datasets from the Department of Statistics. Due to departmental name changes, materials from the following historical department are also included here: Mathematics and Statistics.

Browse

Now showing 1 - 4 of 4

Embargo
Bayesian tree based methods for longitudinally assessed environmental mixtures
(Colorado State University. Libraries, 2024) Im, Seongwon, author; Wilson, Ander, advisor; Keller, Kayleigh, committee member; Koslovsky, Matt, committee member; Neophytou, Andreas, committee member
In various fields, there is interest in estimating the lagged association between an exposure and an outcome. This is particularly common in environmental health studies, where exposure to an environmental chemical is measured repeatedly during gestation for the assessment of its lagged effects on a birth outcome. The relationship between longitudinally assessed environmental mixtures and a health outcome is also of greater interest. For a single exposure, a distributed lag model (DLM) is a widely used method that provides an appropriate temporal structure for estimating the time-varying effects. For mixture exposures, a distributed lag mixture model is used to address the main effect of each exposure and lagged interactions among exposures. The main inferential goals include estimating the lag-specific effects and identifying a window of susceptibility, during which a fetus is particularly vulnerable. In this dissertation, we propose novel statistical methods for estimating exposure effects of longitudinally assessed environmental mixtures in various scenarios. First, we propose a method that can estimate a linear exposure-time-response function between mixture exposures and a count outcome that may be zero-inflated and overdispersed. To achieve this, we employ a Bayesian Pólya-Gamma data augmentation with a treed distributed lag mixture model framework. We apply the method to estimate the relationship between weekly average fine particulate matter (PM2.5) and temperature and pregnancy loss with live-birth identified conception time series design with administrative data from Colorado. Second, we propose a tree triplet structure to allow for heterogeneity in exposure effects in an environmental mixture exposure setting. Our method accommodates modifier and exposure selection, which allows for personalized and subgroup-specific effect estimation and windows of susceptibility identification. We apply the method to Colorado administrative birth data to examine the heterogeneous relationship between PM2.5 and temperature and birth weight. Finally, we introduce an R package dlmtree that integrates tree structured DLM methods into convenient software. We provide an overview of the embedded tree structured DLMs and use simulated data to demonstrate a model fitting process, statistical inference, and visualization.
Open Access
Dataset associated with "Laboratory evaluation of low-cost PurpleAir PM monitors and in-field correction using co-located portable filter samplers"
(Colorado State University. Libraries, 2019) Tryner, Jessica; L'Orange, Christian; Mehaffy, John; Miller-Lionberg, Daniel; Hofstetter, Josephine C.; Wilson, Ander; Volckens, John
Low-cost aerosol monitors can provide more spatially- and temporally-resolved data on ambient fine particulate matter (PM2.5) concentrations than are typically available from regulatory monitoring networks; however, low-cost monitors—which do not measure PM2.5 mass directly and tend to be sensitive to variations in particle size and refractive index—sometimes produce inaccurate concentration estimates. We investigated laboratory- and field-based approaches for calibrating low-cost PurpleAir monitors against gravimetric filter samples. First, we investigated the linearity of the PurpleAir response to NIST Urban PM and derived a laboratory-based gravimetric correction factor. Then, we co-located PurpleAir monitors with portable filter samplers at 15 outdoor sites spanning a 3×3-km area in Fort Collins, CO, USA. We evaluated whether PM2.5 correction factors derived from periodic co-locations with portable filter samplers improved the accuracy of PurpleAir monitors (relative to reference filter samplers operated at 16.7 L/min). We also compared 72-hour average PM2.5 concentrations measured using portable and reference filter samplers. Both before and after field deployment, the coefficient of determination for a linear model relating NIST Urban PM concentrations measured by a tapered element oscillating microbalance and the PurpleAir monitors (PM2.5 ATM) was 0.99; however, an F-test identified a significant lack of fit between the model and the data. The laboratory-based correction factor did not translate to the field. Correction factors derived in the field from monthly, weekly, semi-weekly, and concurrent co-locations with portable filter samplers increased the fraction of 72-hour average PurpleAir PM2.5 concentrations that were within 20% of the reference concentrations from 15% (for uncorrected measurements) to 45%, 59%, 56%, and 70%, respectively. Furthermore, 72-hour average PM2.5 concentrations measured using portable and reference filter samplers agreed (bias ≤ 20% for 71% of samples). These results demonstrate that periodic co-location with portable filter samplers can improve the accuracy of 72-hour average PM2.5 concentrations reported by PurpleAir monitors.
Open Access
Joint tail modeling via regular variation with applications in climate and environmental studies
(Colorado State University. Libraries, 2013) Weller, Grant B., author; Cooley, Dan, advisor; Breidt, F. Jay, committee member; Estep, Donald, committee member; Schumacher, Russ, committee member
This dissertation presents applied, theoretical, and methodological advances in the statistical analysis of multivariate extreme values, employing the underlying mathematical framework of multivariate regular variation. Existing theory is applied in two studies in climatology; these investigations represent novel applications of the regular variation framework in this field. Motivated by applications in environmental studies, a theoretical development in the analysis of extremes is introduced, along with novel statistical methodology. This work first details a novel study which employs the regular variation modeling framework to study uncertainties in a regional climate model's simulation of extreme precipitation events along the west coast of the United States, with a particular focus on the Pineapple Express (PE), a special type of winter storm. We model the tail dependence in past daily precipitation amounts seen in observational data and output of the regional climate model, and we link atmospheric pressure fields to PE events. The fitted dependence model is utilized as a stochastic simulator of future extreme precipitation events, given output from a future-scenario run of the climate model. The simulator and link to pressure fields are used to quantify the uncertainty in a future simulation of extreme precipitation events from the regional climate model, given boundary conditions from a general circulation model. A related study investigates two case studies of extreme precipitation from six regional climate models in the North American Regional Climate Change Assessment Program (NARCCAP). We find that simulated winter season daily precipitation along the Pacific coast exhibit tail dependence to extreme events in the observational record. When considering summer season daily precipitation over a central region of the United States, however, we find almost no correspondence between extremes simulated by NARCCAP and those seen in observations. Furthermore, we discover less consistency among the NARCCAP models in the tail behavior of summer precipitation over this region than that seen in winter precipitation over the west coast region. The analyses in this work indicate that the NARCCAP models are effective at downscaling winter precipitation extremes in the west coast region, but questions remain about their ability to simulate summer-season precipitation extremes in the central region. A deficiency of existing modeling techniques based on the multivariate regular variation framework is the inability to account for hidden regular variation, a feature of many theoretical examples and real data sets. One particular example of this deficiency is the inability to distinguish asymptotic independence from independence in the usual sense. This work develops a novel probabilistic characterization of random vectors possessing hidden regular variation as the sum of independent components. The characterization is shown to be asymptotically valid via a multivariate tail equivalence result, and an example is demonstrated via simulation. The sum characterization is employed to perform inference for the joint tail of random vectors possessing hidden regular variation. This dissertation develops a likelihood-based estimation procedure, employing a novel version of the Monte Carlo expectation-maximization algorithm which has been modified for tail estimation. The methodology is demonstrated on simulated data and applied to a bivariate series of air pollution data from Leeds, UK. We demonstrate the improvement in tail risk estimates offered by the sum representation over approaches which ignore hidden regular variation in the data.
Open Access
Methodology in air pollution epidemiology for large-scale exposure prediction and environmental trials with non-compliance
(Colorado State University. Libraries, 2023) Ryder, Nathan, author; Keller, Kayleigh, advisor; Wilson, Ander, committee member; Cooley, Daniel, committee member; Neophytou, Andreas, committee member
Exposure to airborne pollutants, both long- and short-term, can lead to harmful respiratory, cardiovascular, and cardiometabolic outcomes. Multiple challenges arise in the study of relationships between ambient air pollution and health outcomes. For example, in large observational cohort studies, individual measurements are not feasible so researchers use small sets of pollutant concentration measurements to predict subject-level exposures. As a second example, inconsistent compliance of subjects to their assigned treatments can affect results from randomized controlled trials of environmental interventions. In this dissertation, we present methods to address these challenges. We develop a penalized regression model that can predict particulate matter exposures in space and time, including penalties to discourage overfitting and encourage smoothness in time. This model is more accurate than spatial-only and spatiotemporal universal kriging (UK) models when the exposures are missing in a regular (semi-daily) pattern. Our penalized regression model is also faster than both UK models, allowing the use of bootstrap methods to account for measurement error bias and monitor site selection in a two-stage health model. We introduce methods to estimate causal effects in a longitudinal setting by latent "at-the-time" principal strata. We implement an array of linear mixed models on data subsets, each with weights derived from principal scores. In addition, we estimate the same stratified causal effects with a Bayesian mixture model. The weighted linear mixed models outperform the Bayesian mixture model and an existing single-measure principal scores method in all simulation scenarios, and are the only method to produce a significant estimate for a causal effect of treatment assignment by strata when applied to a Honduran cookstove intervention study. Finally, we extend the "at-the-time" longitudinal principal stratification framework to a setting where continuous exposure measurements are the post-treatment variable by which the latent strata are defined. We categorize the continuous exposures to a binary variable in order to use our previous method of weighted linear mixed models. We also extend an existing Bayesian approach to the longitudinal setting, which does not require categorization of the exposures. The previous weighted linear mixed model and single-measure principal scores methods are negatively biased when applied to simulated samples, while the Bayesian approach produces the lowest RMSE and bias near zero. The Bayesian approach, when applied to the same Honduran cookstove intervention study as before, does not find a significant estimate for the causal effect of treatment assignment by strata.

Browse

Browsing Department of Statistics by Subject "air pollution"