Browsing by Author "Cooley, Daniel, advisor"

Now showing 1 - 5 of 5

Open Access
Advances in statistical analysis and modeling of extreme values motivated by atmospheric models and data products
(Colorado State University. Libraries, 2018) Fix, Miranda J., author; Cooley, Daniel, advisor; Hoeting, Jennifer, committee member; Wilson, Ander, committee member; Barnes, Elizabeth, committee member
This dissertation presents applied and methodological advances in the statistical analysis and modeling of extreme values. We detail three studies motivated by the types of data found in the atmospheric sciences, such as deterministic model output and observational products. The first two investigations represent novel applications and extensions of extremes methodology to climate and atmospheric studies. The third investigation proposes a new model for areal extremes and develops methods for estimation and inference from the proposed model. We first detail a study which leverages two initial condition ensembles of a global climate model to compare future precipitation extremes under two climate change scenarios. We fit non-stationary generalized extreme value (GEV) models to annual maximum daily precipitation output and compare impacts under the RCP8.5 and RCP4.5 scenarios. A methodological contribution of this work is to demonstrate the potential of a "pattern scaling" approach for extremes, in which we produce predictive GEV distributions of annual precipitation maxima under RCP4.5 given only global mean temperatures for this scenario. We compare results from this less computationally intensive method to those obtained from our GEV model fitted directly to the RCP4.5 output and find that pattern scaling produces reasonable projections. The second study examines, for the first time, the capability of an atmospheric chemistry model to reproduce observed meteorological sensitivities of high and extreme surface ozone (O3). This work develops a novel framework in which we make three types of comparisons between simulated and observational data, comparing (1) tails of the O3 response variable, (2) distributions of meteorological predictor variables, and (3) sensitivities of high and extreme O3 to meteorological predictors. This last comparison is made using quantile regression and a recent tail dependence optimization approach. Across all three study locations, we find substantial differences between simulations and observational data in both meteorology and meteorological sensitivities of high and extreme O3. The final study is motivated by the prevalence of large gridded data products in the atmospheric sciences, and presents methodological advances in the (finite-dimensional) spatial setting. Existing models for spatial extremes, such as max-stable process models, tend to be geostatistical in nature as well as very computationally intensive. Instead, we propose a new model for extremes of areal data, with a common-scale extension, that is inspired by the simultaneous autoregressive (SAR) model in classical spatial statistics. The proposed model extends recent work on transformed-linear operations applied to regularly varying random vectors, and is unique among extremes models in being directly analogous to a classical linear model. We specify a sufficient condition on the spatial dependence parameter such that our extreme SAR model has desirable properties. We also describe the limiting angular measure, which is discrete, and corresponding tail pairwise dependence matrix (TPDM) for the model. After examining model properties, we then investigate two approaches to estimation and inference for the common-scale extreme SAR model. First, we consider a censored likelihood approach, implemented using Bayesian MCMC with a data augmentation step, but find that this approach is not robust to model misspecification. As an alternative, we develop a novel estimation method that minimizes the discrepancy between the TPDM for the fitted model and the estimated TPDM, and find that it is able to produce reasonable estimates of extremal dependence even in the case of model misspecification.
Embargo
Functional methods in outlier detection and concurrent regression
(Colorado State University. Libraries, 2024) Creutzinger, Michael L., author; Cooley, Daniel, advisor; Sharp, Julia L., advisor; Koslovsky, Matt, committee member; Liebl, Dominik, committee member; Ortega, Francisco, committee member
Functional data are data collected on a curve, or surface, over a continuum. The growing presence of high-resolution data has greatly increased the popularity of using and developing methods in functional data analysis (FDA). Functional data may be defined differently from other data structures, but similar ideas apply for these types of data including data exploration, modeling and inference, and post-hoc analyses. The methods presented in this dissertation provide a statistical framework that allows a researcher to carry out an analysis of functional data from "start to finish''. Even with functional data, there is a need to identify outliers prior to conducting statistical analysis procedures. Existing functional data outlier detection methodology requires the use of a functional data depth measure, functional principal components, and/or an outlyingness measure like Stahel-Donoho. Although effective, these functional outlier detection methods may not be easily interpreted. In this dissertation, we propose two new functional outlier detection methods. The first method, Practical Outlier Detection (POD), makes use of ordinary summary statistics (e.g., minimum, maximum, mean, variance, etc). In the second method, we developed a Prediction Band Outlier Detection (PBOD) method that makes use of parametric, simultaneous, prediction bands that meet nominal coverage levels. The two new outlier detection methods were compared to three existing outlier detection methods: MS-Plot, Massive Unsupervised Outlier Detection, and Total Variation Depth. In the simulation results, POD performs as well, or better, than its counterparts in terms of specificity, sensitivity, accuracy, and precision. Similar results were found for PBOD, except for noticeably smaller values of specificity and accuracy than all other methods. Following data exploration and outlier detection, researchers often model their data. In FDA, functional linear regression uses a functional response Yi(t) and scalar and/or functional predictors, Xi(t). A functional concurrent regression model is estimated by regressing Yi on Xi pointwise at each sampling point, t. After estimating a regression model (functional or non-functional), it is common to estimate confidence and prediction intervals for parameter(s), including the conditional mean. A common way to obtain confidence/prediction intervals for simultaneous inference across the sampling domain is to use resampling methods (e.g., bootstrapping or permutation). We propose a new method for estimating parametric, simultaneous confidence and prediction bands for a functional concurrent regression model, without the use of resampling. The method uses Kac-Rice formulas for estimation of a critical value function, which is used with a functional pivot to acquire the simultaneous band. In the results, the proposed method meets nominal coverage levels for both confidence and prediction bands. The method we propose is also substantially faster to compute than methods that require resampling techniques. In linear regression, researchers may also assess if there are influential observations that may impact the estimates and results from the fitted model. Studentized difference in fits (DFFITS), studentized difference in regression coefficient estimates (DFBETAS), and/or Cook's Distance (D) can all be used to identify influential observations. For functional concurrent regression, these measures can be easily computed pointwise for each observation. However, the only current development is to use resampling techniques for estimating a null distribution of the average of each measure. Rather than using the average values and bootstrapping, we propose working with functional DFFITS (DFFITS(t)) directly. We show that if the functional errors are assumed to follow a Gaussian process, DFFITS(t) is distributed uniformly as a scaled Student's t process. Then, we propose using a multivariate Student's t distributional quantile for identifying influential functional observations with DFFITS(t). Our methodology ("Theoretical'') is compared against a competing method that uses a parametric bootstrapping technique ("Bootstrapped'') for estimating the null distribution of the mean absolute value of DFFITS(t). In the simulation and case study results, we find that the Theoretical method greatly reduces the computation time, without much loss in performance as measured by accuracy (ACC), precision (PPV), and Matthew's Correlation Coefficient (MCC), than the Bootstrapped method. Furthermore, the average sensitivity of the Theoretical method is higher in all scenarios than the Bootstrapped method.
Open Access
Linear prediction and partial tail correlation for extremes
(Colorado State University. Libraries, 2022) Lee, Jeongjin, author; Cooley, Daniel, advisor; Kokoszka, Piotr, committee member; Breidt, Jay, committee member; Pezeshki, Ali, committee member
This dissertation consists of three main studies for extreme value analyses: linear prediction for extremes, uncertainty quantification for predictions, and investigating conditional relationships between variables at their extreme levels. We employ multivariate regular variation to provide a framework for modeling dependence in the upper tail, which is assumed to be a direction of interest. Cooley and Thibaud [2019] consider transformed-linear operations to define a vector space on the nonnegative orthant and show regular variation is preserved by these transformed-linear operations. Extending the approach of Cooley and Thibaud [2019], we first consider the problem of performing prediction when observed values are at extreme levels. This linear approach is motivated by the limitation that traditional extreme value models have difficulties fitting a high dimensional extreme value model. We construct an inner product space of nonnegative random variables from transformed-linear combinations of independent regularly varying random variables. Rather than fully characterizing extremal dependence in high dimensions, we summarize tail behavior via a matrix of pairwise tail dependencies. The projection theorem yields the optimal transformed-linear predictor, which has a similar form to the best linear unbiased predictor in non-extreme prediction. We then quantify uncertainty for the prediction of extremes by using information contained in the tail pairwise dependence matrix. We create the 95% prediction interval based on the geometry of regular variation. We show that the prediction intervals have good coverage in a simulation study as well as in two applications: prediction of high NO2 air pollution levels, and prediction of large financial losses. We also compare prediction intervals with a linear approach to ones with a parametric approach. Lastly, we develop the novel notion of partial tail correlation via projection theorem in the inner product space. Partial tail correlations are the analogue of partial correlations in non-extreme statistics but focus on extremal dependence. Partial tail correlation can be represented by the inner product of prediction errors associated with the previously defined best transformed-linear prediction for extremes. We find a connection between the partial tail correlation and the inverse matrix of tail pairwise dependencies. We then develop a hypothesis test for zero elements in the inverse extremal matrix. We apply the idea of partial tail correlation to assess flood risk in application to extreme river discharges in the upper Danube River basin. We compare the extremal graph constructed from the idea of the partial tail correlation to physical flow connections on the Danube.
Open Access
Tail dependence: application, exploration, and development of novel methods
(Colorado State University. Libraries, 2025) Wixson, Troy P., author; Cooley, Daniel, advisor; Shaby, Benjamin, advisor; Huang, Dongzhou, committee member; Wang, Tianying, committee member; Barnes, Elizabeth, committee member
The study of multivariate extreme events is largely concerned with modeling the dependence in the tail of the joint distribution. The understanding of extremal dependence and methodology for modeling that dependence has been an active research field over the past few decades and we contribute to that literature with three projects that are detailed in this dissertation. In the first project we consider the challenge of assessing the changing risk of wildfires. Wildfire risk is greatest during high winds after sustained periods of dry and hot conditions. This chapter is a statistical extreme event risk attribution study which aims to answer whether extreme wildfire seasons are more likely now than under past climate. This requires modeling temporal dependence at extreme levels. We propose the use of transformed-linear time series models which are constructed similarly to traditional ARMA models while having a dependence structure that is tied to a widely used framework for extremes (regular variation). We fit the models to the extreme values of the seasonally adjusted Fire Weather Index (FWI) time series to capture the dependence in the upper tail for past and present climate. Ten-thousand fire seasons are simulated from each fitted model and we compare the proportion of simulated high-risk fire seasons to quantify the increase in risk. Our method suggests that the risk of experiencing an extreme wildfire season in Grand Lake, Colorado under current climate has increased dramatically compared to the risk under the climate of the mid-20th century. Our method also finds some evidence of increased risk of extreme wildfire seasons in Quincy, California, but large uncertainties do not allow us to reject a null hypothesis of no change. In the second project we explore a fundamental characterization of tail dependence and develop a method to classify data into the two regimes. Classifying a data set as asymptotically dependent (AD) or asymptotically independent (AI) is a necessary early choice in the modeling of multivariate extremes. These two dependence regimes are defined asymptotically which complicates inference as practitioners have finite samples. We perform a series of experiments to determine whether a finite sample has enough information for a convolutional neural network to reliably distinguish between these regimes in the bivariate case. Along the way we develop a new classification tool for practitioners which we call nnadic as it is a Neural Network for Asymptotic Dependence/Independence Classification. This tool accurately classifies 95\% of test datasets and is robust to a wide range of sample sizes. The datasets which we are unable to correctly classify tend to either be nearly exactly independent or exhibit near perfect dependence, which are boundary cases for both the AD and AI models used for training. In the third project we consider the challenge of using likelihood methods for models developed for the tail of the distribution. Many multivariate extremes models have intractable likelihoods thus practitioners must use alternative fitting methods and likelihood-based methods for uncertainty quantification and model selection are unavailable. We develop a proxy-likelihood estimator for multivariate extremes models. Our method is based on the tail pairwise dependence (TPD) which is a summary measure of the dependence in the tail of any multivariate extremes model. The TPD parameter has a one-to-one relationship with the dependence parameter of the HR distribution. We use the HR distribution as a proxy for the likelihood in a composite likelihood approach. The method is demonstrated using the transformed linear extremes time series (TLETS) models of Mhatre & Cooley (2024).
Open Access
Transformed-linear models for time series extremes
(Colorado State University. Libraries, 2022) Mhatre, Nehali, author; Cooley, Daniel, advisor; Kokoszka, Piotr, committee member; Shaby, Benjamin, committee member; Wang, Tianyang, committee member
In order to capture the dependence in the upper tail of a time series, we develop nonnegative regularly-varying time series models that are constructed similarly to classical non-extreme ARMA models. Rather than fully characterizing tail dependence of the time series, we define the concept of weak tail stationarity which allows us to describe a regularly-varying time series through the tail pairwise dependence function (TPDF) which is a measure of pairwise extremal dependencies. We state consistency requirements among the finite-dimensional collections of the elements of a regularly-varying time series and show that the TPDF's value does not depend on the dimension being considered. So that our models take nonnegative values, we use transformed-linear operations. We show existence and stationarity of these models, and develop their properties such as the model TPDF's. Motivated by investigating conditions conducive to the spread of wildfires, we fit models to hourly windspeed data using a preliminary estimation method and find that the fitted transformed-linear models produce better estimates of upper tail quantities than traditional ARMA models or than classical linear regularly-varying models. The innovations algorithm is a classical recursive algorithm used in time series analysis. We develop an analogous transformed-linear innovations algorithm for our time series models that allows us to perform prediction which is fundamental to any time series analysis. The transformed-linear innovations algorithm also enables us to estimate parameters of the transformed-linear regularly-varying moving average models, thus providing a tool for modeling. We construct an inner product space of transformed-linear combinations of nonnegative regularly-varying random variables and prove its link to a Hilbert space which allows us to employ the projection theorem. We develop the transformed-linear innovations algorithm using the properties of the projection theorem. Turning our attention to the class of MA(∞) models, we talk about estimation and also show that this class of models is dense in the class of possible TPDFs. We also develop an extremes analogue of the classical Wold decomposition. Simulation study shows that our class of models provides adequate models for the GARCH and another model outside our class of models. The transformed-linear innovations algorithm gives us the best prediction and we also develop prediction intervals based on the geometry of regular variation. Simulation study shows that we obtain good coverage rates for prediction errors. We perform modeling and prediction for the hourly windspeed data by applying the innovations algorithm to the estimated TPDF.