Browsing by Author "Koslovsky, Matt, committee member"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Embargo Bayesian tree based methods for longitudinally assessed environmental mixtures(Colorado State University. Libraries, 2024) Im, Seongwon, author; Wilson, Ander, advisor; Keller, Kayleigh, committee member; Koslovsky, Matt, committee member; Neophytou, Andreas, committee memberIn various fields, there is interest in estimating the lagged association between an exposure and an outcome. This is particularly common in environmental health studies, where exposure to an environmental chemical is measured repeatedly during gestation for the assessment of its lagged effects on a birth outcome. The relationship between longitudinally assessed environmental mixtures and a health outcome is also of greater interest. For a single exposure, a distributed lag model (DLM) is a widely used method that provides an appropriate temporal structure for estimating the time-varying effects. For mixture exposures, a distributed lag mixture model is used to address the main effect of each exposure and lagged interactions among exposures. The main inferential goals include estimating the lag-specific effects and identifying a window of susceptibility, during which a fetus is particularly vulnerable. In this dissertation, we propose novel statistical methods for estimating exposure effects of longitudinally assessed environmental mixtures in various scenarios. First, we propose a method that can estimate a linear exposure-time-response function between mixture exposures and a count outcome that may be zero-inflated and overdispersed. To achieve this, we employ a Bayesian Pólya-Gamma data augmentation with a treed distributed lag mixture model framework. We apply the method to estimate the relationship between weekly average fine particulate matter (PM2.5) and temperature and pregnancy loss with live-birth identified conception time series design with administrative data from Colorado. Second, we propose a tree triplet structure to allow for heterogeneity in exposure effects in an environmental mixture exposure setting. Our method accommodates modifier and exposure selection, which allows for personalized and subgroup-specific effect estimation and windows of susceptibility identification. We apply the method to Colorado administrative birth data to examine the heterogeneous relationship between PM2.5 and temperature and birth weight. Finally, we introduce an R package dlmtree that integrates tree structured DLM methods into convenient software. We provide an overview of the embedded tree structured DLMs and use simulated data to demonstrate a model fitting process, statistical inference, and visualization.Item Embargo Functional methods in outlier detection and concurrent regression(Colorado State University. Libraries, 2024) Creutzinger, Michael L., author; Cooley, Daniel, advisor; Sharp, Julia L., advisor; Koslovsky, Matt, committee member; Liebl, Dominik, committee member; Ortega, Francisco, committee memberFunctional data are data collected on a curve, or surface, over a continuum. The growing presence of high-resolution data has greatly increased the popularity of using and developing methods in functional data analysis (FDA). Functional data may be defined differently from other data structures, but similar ideas apply for these types of data including data exploration, modeling and inference, and post-hoc analyses. The methods presented in this dissertation provide a statistical framework that allows a researcher to carry out an analysis of functional data from "start to finish''. Even with functional data, there is a need to identify outliers prior to conducting statistical analysis procedures. Existing functional data outlier detection methodology requires the use of a functional data depth measure, functional principal components, and/or an outlyingness measure like Stahel-Donoho. Although effective, these functional outlier detection methods may not be easily interpreted. In this dissertation, we propose two new functional outlier detection methods. The first method, Practical Outlier Detection (POD), makes use of ordinary summary statistics (e.g., minimum, maximum, mean, variance, etc). In the second method, we developed a Prediction Band Outlier Detection (PBOD) method that makes use of parametric, simultaneous, prediction bands that meet nominal coverage levels. The two new outlier detection methods were compared to three existing outlier detection methods: MS-Plot, Massive Unsupervised Outlier Detection, and Total Variation Depth. In the simulation results, POD performs as well, or better, than its counterparts in terms of specificity, sensitivity, accuracy, and precision. Similar results were found for PBOD, except for noticeably smaller values of specificity and accuracy than all other methods. Following data exploration and outlier detection, researchers often model their data. In FDA, functional linear regression uses a functional response Yi(t) and scalar and/or functional predictors, Xi(t). A functional concurrent regression model is estimated by regressing Yi on Xi pointwise at each sampling point, t. After estimating a regression model (functional or non-functional), it is common to estimate confidence and prediction intervals for parameter(s), including the conditional mean. A common way to obtain confidence/prediction intervals for simultaneous inference across the sampling domain is to use resampling methods (e.g., bootstrapping or permutation). We propose a new method for estimating parametric, simultaneous confidence and prediction bands for a functional concurrent regression model, without the use of resampling. The method uses Kac-Rice formulas for estimation of a critical value function, which is used with a functional pivot to acquire the simultaneous band. In the results, the proposed method meets nominal coverage levels for both confidence and prediction bands. The method we propose is also substantially faster to compute than methods that require resampling techniques. In linear regression, researchers may also assess if there are influential observations that may impact the estimates and results from the fitted model. Studentized difference in fits (DFFITS), studentized difference in regression coefficient estimates (DFBETAS), and/or Cook's Distance (D) can all be used to identify influential observations. For functional concurrent regression, these measures can be easily computed pointwise for each observation. However, the only current development is to use resampling techniques for estimating a null distribution of the average of each measure. Rather than using the average values and bootstrapping, we propose working with functional DFFITS (DFFITS(t)) directly. We show that if the functional errors are assumed to follow a Gaussian process, DFFITS(t) is distributed uniformly as a scaled Student's t process. Then, we propose using a multivariate Student's t distributional quantile for identifying influential functional observations with DFFITS(t). Our methodology ("Theoretical'') is compared against a competing method that uses a parametric bootstrapping technique ("Bootstrapped'') for estimating the null distribution of the mean absolute value of DFFITS(t). In the simulation and case study results, we find that the Theoretical method greatly reduces the computation time, without much loss in performance as measured by accuracy (ACC), precision (PPV), and Matthew's Correlation Coefficient (MCC), than the Bootstrapped method. Furthermore, the average sensitivity of the Theoretical method is higher in all scenarios than the Bootstrapped method.Item Open Access Impact of various factors on partial least squares model robustness for nondestructive peach fruit quality assessment(Colorado State University. Libraries, 2023) Pott, Jakob, author; Minas, Ioannis, advisor; Eakes, Joe, committee member; Koslovsky, Matt, committee memberGiven declining fruit consumption due to poor fruit quality and large amounts of waste, peach growers have continuously suffered from financial loss and the industry has seen a sharp decline in recent decades. Due to the time consuming and destructive nature of conventional fruit quality assessment, many peach growers prioritize fruit characteristics conducive to shipping and storage over characteristics which correlate with consumer acceptance. This prioritization has resulted in the poor-quality fruit which consumers have grown to associate with fresh peaches and contributed to large annual waste. A potential solution is the use of near-infrared spectroscopy (Vis-NIRS) paired with partial least squares (PLS) modeling, as a field deployable method that can be used to measure preharvest internal fruit quality to produce information quickly and non-destructively. These qualities offer an answer to declining fruit quality and waste. Although promising, the technology is only as good as the data used to train the models. Quality data is hard to collect as it requires the consideration of many factors including the temperature of the sample and the inclusion of biological variability impacted by seasonal changes, cultivar differences, fruit maturity, and many management factors such as crop load, rootstocks, irrigation regimes, and training systems to capture the relationships needed for good model performance. In tree fruit research, handheld Vis-NIRS devices have been used to predict internal quality parameters such as sweetness (dry matter content, DMC; soluble solids concentration, SSC) and fruit physiological maturity related to chlorophyll content (index of absorbance difference, IAD). Although accurate, the statistical models used to make such predictions often struggle with robustness across cultivars and growing seasons and regions due to a lack of biological variability, or a lack of representative data from factors like temperature. These challenges have led to slow industry adoption. To address this issue, models for 13 distinct peach cultivars were constructed by combining data from two seasons (2016 and 2021) followed by external validation with data from a third season (2022). The data from 2016 was collected over a range of preharvest factors, fruit development stages and temperatures, and the inclusion of 2021 data added additional biological variability. External validation produced error rates of 0.36 - 0.42%, 0.59 - 0.63%, and 0.05 - 0.04 for DMC, SSC and IAD, respectively, across the 13 peach cultivars indicating the models trained in 2021 were robust and performing at an acceptable level to impact grower decision making. It was observed that the additional inclusion of data from different cultivars and growing environments, as well as a third growing season (2017) did not significantly impact model performance. The lack of improvement suggests that the data from each year contain enough covariate variability to cover a broad range of measurements (i.e. input values) that growers and researchers are likely to observe when collecting data to predict peach quality in different orchards or seasons. This insensitivity to various environmental and growing conditions, generally referred to as external factors, due to the variability captured in the data used to build model is characteristic of a robust model.