Browsing by Author "Koslovsky, Matt, committee member"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Embargo Functional methods in outlier detection and concurrent regression(Colorado State University. Libraries, 2024) Creutzinger, Michael L., author; Cooley, Daniel, advisor; Sharp, Julia L., advisor; Koslovsky, Matt, committee member; Liebl, Dominik, committee member; Ortega, Francisco, committee memberFunctional data are data collected on a curve, or surface, over a continuum. The growing presence of high-resolution data has greatly increased the popularity of using and developing methods in functional data analysis (FDA). Functional data may be defined differently from other data structures, but similar ideas apply for these types of data including data exploration, modeling and inference, and post-hoc analyses. The methods presented in this dissertation provide a statistical framework that allows a researcher to carry out an analysis of functional data from "start to finish''. Even with functional data, there is a need to identify outliers prior to conducting statistical analysis procedures. Existing functional data outlier detection methodology requires the use of a functional data depth measure, functional principal components, and/or an outlyingness measure like Stahel-Donoho. Although effective, these functional outlier detection methods may not be easily interpreted. In this dissertation, we propose two new functional outlier detection methods. The first method, Practical Outlier Detection (POD), makes use of ordinary summary statistics (e.g., minimum, maximum, mean, variance, etc). In the second method, we developed a Prediction Band Outlier Detection (PBOD) method that makes use of parametric, simultaneous, prediction bands that meet nominal coverage levels. The two new outlier detection methods were compared to three existing outlier detection methods: MS-Plot, Massive Unsupervised Outlier Detection, and Total Variation Depth. In the simulation results, POD performs as well, or better, than its counterparts in terms of specificity, sensitivity, accuracy, and precision. Similar results were found for PBOD, except for noticeably smaller values of specificity and accuracy than all other methods. Following data exploration and outlier detection, researchers often model their data. In FDA, functional linear regression uses a functional response Yi(t) and scalar and/or functional predictors, Xi(t). A functional concurrent regression model is estimated by regressing Yi on Xi pointwise at each sampling point, t. After estimating a regression model (functional or non-functional), it is common to estimate confidence and prediction intervals for parameter(s), including the conditional mean. A common way to obtain confidence/prediction intervals for simultaneous inference across the sampling domain is to use resampling methods (e.g., bootstrapping or permutation). We propose a new method for estimating parametric, simultaneous confidence and prediction bands for a functional concurrent regression model, without the use of resampling. The method uses Kac-Rice formulas for estimation of a critical value function, which is used with a functional pivot to acquire the simultaneous band. In the results, the proposed method meets nominal coverage levels for both confidence and prediction bands. The method we propose is also substantially faster to compute than methods that require resampling techniques. In linear regression, researchers may also assess if there are influential observations that may impact the estimates and results from the fitted model. Studentized difference in fits (DFFITS), studentized difference in regression coefficient estimates (DFBETAS), and/or Cook's Distance (D) can all be used to identify influential observations. For functional concurrent regression, these measures can be easily computed pointwise for each observation. However, the only current development is to use resampling techniques for estimating a null distribution of the average of each measure. Rather than using the average values and bootstrapping, we propose working with functional DFFITS (DFFITS(t)) directly. We show that if the functional errors are assumed to follow a Gaussian process, DFFITS(t) is distributed uniformly as a scaled Student's t process. Then, we propose using a multivariate Student's t distributional quantile for identifying influential functional observations with DFFITS(t). Our methodology ("Theoretical'') is compared against a competing method that uses a parametric bootstrapping technique ("Bootstrapped'') for estimating the null distribution of the mean absolute value of DFFITS(t). In the simulation and case study results, we find that the Theoretical method greatly reduces the computation time, without much loss in performance as measured by accuracy (ACC), precision (PPV), and Matthew's Correlation Coefficient (MCC), than the Bootstrapped method. Furthermore, the average sensitivity of the Theoretical method is higher in all scenarios than the Bootstrapped method.Item Open Access Impact of various factors on partial least squares model robustness for nondestructive peach fruit quality assessment(Colorado State University. Libraries, 2023) Pott, Jakob, author; Minas, Ioannis, advisor; Eakes, Joe, committee member; Koslovsky, Matt, committee memberGiven declining fruit consumption due to poor fruit quality and large amounts of waste, peach growers have continuously suffered from financial loss and the industry has seen a sharp decline in recent decades. Due to the time consuming and destructive nature of conventional fruit quality assessment, many peach growers prioritize fruit characteristics conducive to shipping and storage over characteristics which correlate with consumer acceptance. This prioritization has resulted in the poor-quality fruit which consumers have grown to associate with fresh peaches and contributed to large annual waste. A potential solution is the use of near-infrared spectroscopy (Vis-NIRS) paired with partial least squares (PLS) modeling, as a field deployable method that can be used to measure preharvest internal fruit quality to produce information quickly and non-destructively. These qualities offer an answer to declining fruit quality and waste. Although promising, the technology is only as good as the data used to train the models. Quality data is hard to collect as it requires the consideration of many factors including the temperature of the sample and the inclusion of biological variability impacted by seasonal changes, cultivar differences, fruit maturity, and many management factors such as crop load, rootstocks, irrigation regimes, and training systems to capture the relationships needed for good model performance. In tree fruit research, handheld Vis-NIRS devices have been used to predict internal quality parameters such as sweetness (dry matter content, DMC; soluble solids concentration, SSC) and fruit physiological maturity related to chlorophyll content (index of absorbance difference, IAD). Although accurate, the statistical models used to make such predictions often struggle with robustness across cultivars and growing seasons and regions due to a lack of biological variability, or a lack of representative data from factors like temperature. These challenges have led to slow industry adoption. To address this issue, models for 13 distinct peach cultivars were constructed by combining data from two seasons (2016 and 2021) followed by external validation with data from a third season (2022). The data from 2016 was collected over a range of preharvest factors, fruit development stages and temperatures, and the inclusion of 2021 data added additional biological variability. External validation produced error rates of 0.36 - 0.42%, 0.59 - 0.63%, and 0.05 - 0.04 for DMC, SSC and IAD, respectively, across the 13 peach cultivars indicating the models trained in 2021 were robust and performing at an acceptable level to impact grower decision making. It was observed that the additional inclusion of data from different cultivars and growing environments, as well as a third growing season (2017) did not significantly impact model performance. The lack of improvement suggests that the data from each year contain enough covariate variability to cover a broad range of measurements (i.e. input values) that growers and researchers are likely to observe when collecting data to predict peach quality in different orchards or seasons. This insensitivity to various environmental and growing conditions, generally referred to as external factors, due to the variability captured in the data used to build model is characteristic of a robust model.