Repository logo

Methodology in air pollution epidemiology for large-scale exposure prediction and environmental trials with non-compliance


Exposure to airborne pollutants, both long- and short-term, can lead to harmful respiratory, cardiovascular, and cardiometabolic outcomes. Multiple challenges arise in the study of relationships between ambient air pollution and health outcomes. For example, in large observational cohort studies, individual measurements are not feasible so researchers use small sets of pollutant concentration measurements to predict subject-level exposures. As a second example, inconsistent compliance of subjects to their assigned treatments can affect results from randomized controlled trials of environmental interventions. In this dissertation, we present methods to address these challenges. We develop a penalized regression model that can predict particulate matter exposures in space and time, including penalties to discourage overfitting and encourage smoothness in time. This model is more accurate than spatial-only and spatiotemporal universal kriging (UK) models when the exposures are missing in a regular (semi-daily) pattern. Our penalized regression model is also faster than both UK models, allowing the use of bootstrap methods to account for measurement error bias and monitor site selection in a two-stage health model. We introduce methods to estimate causal effects in a longitudinal setting by latent "at-the-time" principal strata. We implement an array of linear mixed models on data subsets, each with weights derived from principal scores. In addition, we estimate the same stratified causal effects with a Bayesian mixture model. The weighted linear mixed models outperform the Bayesian mixture model and an existing single-measure principal scores method in all simulation scenarios, and are the only method to produce a significant estimate for a causal effect of treatment assignment by strata when applied to a Honduran cookstove intervention study. Finally, we extend the "at-the-time" longitudinal principal stratification framework to a setting where continuous exposure measurements are the post-treatment variable by which the latent strata are defined. We categorize the continuous exposures to a binary variable in order to use our previous method of weighted linear mixed models. We also extend an existing Bayesian approach to the longitudinal setting, which does not require categorization of the exposures. The previous weighted linear mixed model and single-measure principal scores methods are negatively biased when applied to simulated samples, while the Bayesian approach produces the lowest RMSE and bias near zero. The Bayesian approach, when applied to the same Honduran cookstove intervention study as before, does not find a significant estimate for the causal effect of treatment assignment by strata.


2023 Summer.
Includes bibliographical references.

Rights Access


particulate matter
principal stratification
principal scores
air pollution


Associated Publications