Browsing by Author "Cooley, Daniel, advisor"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Open Access Advances in statistical analysis and modeling of extreme values motivated by atmospheric models and data products(Colorado State University. Libraries, 2018) Fix, Miranda J., author; Cooley, Daniel, advisor; Hoeting, Jennifer, committee member; Wilson, Ander, committee member; Barnes, Elizabeth, committee memberThis dissertation presents applied and methodological advances in the statistical analysis and modeling of extreme values. We detail three studies motivated by the types of data found in the atmospheric sciences, such as deterministic model output and observational products. The first two investigations represent novel applications and extensions of extremes methodology to climate and atmospheric studies. The third investigation proposes a new model for areal extremes and develops methods for estimation and inference from the proposed model. We first detail a study which leverages two initial condition ensembles of a global climate model to compare future precipitation extremes under two climate change scenarios. We fit non-stationary generalized extreme value (GEV) models to annual maximum daily precipitation output and compare impacts under the RCP8.5 and RCP4.5 scenarios. A methodological contribution of this work is to demonstrate the potential of a "pattern scaling" approach for extremes, in which we produce predictive GEV distributions of annual precipitation maxima under RCP4.5 given only global mean temperatures for this scenario. We compare results from this less computationally intensive method to those obtained from our GEV model fitted directly to the RCP4.5 output and find that pattern scaling produces reasonable projections. The second study examines, for the first time, the capability of an atmospheric chemistry model to reproduce observed meteorological sensitivities of high and extreme surface ozone (O3). This work develops a novel framework in which we make three types of comparisons between simulated and observational data, comparing (1) tails of the O3 response variable, (2) distributions of meteorological predictor variables, and (3) sensitivities of high and extreme O3 to meteorological predictors. This last comparison is made using quantile regression and a recent tail dependence optimization approach. Across all three study locations, we find substantial differences between simulations and observational data in both meteorology and meteorological sensitivities of high and extreme O3. The final study is motivated by the prevalence of large gridded data products in the atmospheric sciences, and presents methodological advances in the (finite-dimensional) spatial setting. Existing models for spatial extremes, such as max-stable process models, tend to be geostatistical in nature as well as very computationally intensive. Instead, we propose a new model for extremes of areal data, with a common-scale extension, that is inspired by the simultaneous autoregressive (SAR) model in classical spatial statistics. The proposed model extends recent work on transformed-linear operations applied to regularly varying random vectors, and is unique among extremes models in being directly analogous to a classical linear model. We specify a sufficient condition on the spatial dependence parameter such that our extreme SAR model has desirable properties. We also describe the limiting angular measure, which is discrete, and corresponding tail pairwise dependence matrix (TPDM) for the model. After examining model properties, we then investigate two approaches to estimation and inference for the common-scale extreme SAR model. First, we consider a censored likelihood approach, implemented using Bayesian MCMC with a data augmentation step, but find that this approach is not robust to model misspecification. As an alternative, we develop a novel estimation method that minimizes the discrepancy between the TPDM for the fitted model and the estimated TPDM, and find that it is able to produce reasonable estimates of extremal dependence even in the case of model misspecification.Item Embargo Functional methods in outlier detection and concurrent regression(Colorado State University. Libraries, 2024) Creutzinger, Michael L., author; Cooley, Daniel, advisor; Sharp, Julia L., advisor; Koslovsky, Matt, committee member; Liebl, Dominik, committee member; Ortega, Francisco, committee memberFunctional data are data collected on a curve, or surface, over a continuum. The growing presence of high-resolution data has greatly increased the popularity of using and developing methods in functional data analysis (FDA). Functional data may be defined differently from other data structures, but similar ideas apply for these types of data including data exploration, modeling and inference, and post-hoc analyses. The methods presented in this dissertation provide a statistical framework that allows a researcher to carry out an analysis of functional data from "start to finish''. Even with functional data, there is a need to identify outliers prior to conducting statistical analysis procedures. Existing functional data outlier detection methodology requires the use of a functional data depth measure, functional principal components, and/or an outlyingness measure like Stahel-Donoho. Although effective, these functional outlier detection methods may not be easily interpreted. In this dissertation, we propose two new functional outlier detection methods. The first method, Practical Outlier Detection (POD), makes use of ordinary summary statistics (e.g., minimum, maximum, mean, variance, etc). In the second method, we developed a Prediction Band Outlier Detection (PBOD) method that makes use of parametric, simultaneous, prediction bands that meet nominal coverage levels. The two new outlier detection methods were compared to three existing outlier detection methods: MS-Plot, Massive Unsupervised Outlier Detection, and Total Variation Depth. In the simulation results, POD performs as well, or better, than its counterparts in terms of specificity, sensitivity, accuracy, and precision. Similar results were found for PBOD, except for noticeably smaller values of specificity and accuracy than all other methods. Following data exploration and outlier detection, researchers often model their data. In FDA, functional linear regression uses a functional response Yi(t) and scalar and/or functional predictors, Xi(t). A functional concurrent regression model is estimated by regressing Yi on Xi pointwise at each sampling point, t. After estimating a regression model (functional or non-functional), it is common to estimate confidence and prediction intervals for parameter(s), including the conditional mean. A common way to obtain confidence/prediction intervals for simultaneous inference across the sampling domain is to use resampling methods (e.g., bootstrapping or permutation). We propose a new method for estimating parametric, simultaneous confidence and prediction bands for a functional concurrent regression model, without the use of resampling. The method uses Kac-Rice formulas for estimation of a critical value function, which is used with a functional pivot to acquire the simultaneous band. In the results, the proposed method meets nominal coverage levels for both confidence and prediction bands. The method we propose is also substantially faster to compute than methods that require resampling techniques. In linear regression, researchers may also assess if there are influential observations that may impact the estimates and results from the fitted model. Studentized difference in fits (DFFITS), studentized difference in regression coefficient estimates (DFBETAS), and/or Cook's Distance (D) can all be used to identify influential observations. For functional concurrent regression, these measures can be easily computed pointwise for each observation. However, the only current development is to use resampling techniques for estimating a null distribution of the average of each measure. Rather than using the average values and bootstrapping, we propose working with functional DFFITS (DFFITS(t)) directly. We show that if the functional errors are assumed to follow a Gaussian process, DFFITS(t) is distributed uniformly as a scaled Student's t process. Then, we propose using a multivariate Student's t distributional quantile for identifying influential functional observations with DFFITS(t). Our methodology ("Theoretical'') is compared against a competing method that uses a parametric bootstrapping technique ("Bootstrapped'') for estimating the null distribution of the mean absolute value of DFFITS(t). In the simulation and case study results, we find that the Theoretical method greatly reduces the computation time, without much loss in performance as measured by accuracy (ACC), precision (PPV), and Matthew's Correlation Coefficient (MCC), than the Bootstrapped method. Furthermore, the average sensitivity of the Theoretical method is higher in all scenarios than the Bootstrapped method.Item Open Access Linear prediction and partial tail correlation for extremes(Colorado State University. Libraries, 2022) Lee, Jeongjin, author; Cooley, Daniel, advisor; Kokoszka, Piotr, committee member; Breidt, Jay, committee member; Pezeshki, Ali, committee memberThis dissertation consists of three main studies for extreme value analyses: linear prediction for extremes, uncertainty quantification for predictions, and investigating conditional relationships between variables at their extreme levels. We employ multivariate regular variation to provide a framework for modeling dependence in the upper tail, which is assumed to be a direction of interest. Cooley and Thibaud [2019] consider transformed-linear operations to define a vector space on the nonnegative orthant and show regular variation is preserved by these transformed-linear operations. Extending the approach of Cooley and Thibaud [2019], we first consider the problem of performing prediction when observed values are at extreme levels. This linear approach is motivated by the limitation that traditional extreme value models have difficulties fitting a high dimensional extreme value model. We construct an inner product space of nonnegative random variables from transformed-linear combinations of independent regularly varying random variables. Rather than fully characterizing extremal dependence in high dimensions, we summarize tail behavior via a matrix of pairwise tail dependencies. The projection theorem yields the optimal transformed-linear predictor, which has a similar form to the best linear unbiased predictor in non-extreme prediction. We then quantify uncertainty for the prediction of extremes by using information contained in the tail pairwise dependence matrix. We create the 95% prediction interval based on the geometry of regular variation. We show that the prediction intervals have good coverage in a simulation study as well as in two applications: prediction of high NO2 air pollution levels, and prediction of large financial losses. We also compare prediction intervals with a linear approach to ones with a parametric approach. Lastly, we develop the novel notion of partial tail correlation via projection theorem in the inner product space. Partial tail correlations are the analogue of partial correlations in non-extreme statistics but focus on extremal dependence. Partial tail correlation can be represented by the inner product of prediction errors associated with the previously defined best transformed-linear prediction for extremes. We find a connection between the partial tail correlation and the inverse matrix of tail pairwise dependencies. We then develop a hypothesis test for zero elements in the inverse extremal matrix. We apply the idea of partial tail correlation to assess flood risk in application to extreme river discharges in the upper Danube River basin. We compare the extremal graph constructed from the idea of the partial tail correlation to physical flow connections on the Danube.Item Open Access Transformed-linear models for time series extremes(Colorado State University. Libraries, 2022) Mhatre, Nehali, author; Cooley, Daniel, advisor; Kokoszka, Piotr, committee member; Shaby, Benjamin, committee member; Wang, Tianyang, committee memberIn order to capture the dependence in the upper tail of a time series, we develop nonnegative regularly-varying time series models that are constructed similarly to classical non-extreme ARMA models. Rather than fully characterizing tail dependence of the time series, we define the concept of weak tail stationarity which allows us to describe a regularly-varying time series through the tail pairwise dependence function (TPDF) which is a measure of pairwise extremal dependencies. We state consistency requirements among the finite-dimensional collections of the elements of a regularly-varying time series and show that the TPDF's value does not depend on the dimension being considered. So that our models take nonnegative values, we use transformed-linear operations. We show existence and stationarity of these models, and develop their properties such as the model TPDF's. Motivated by investigating conditions conducive to the spread of wildfires, we fit models to hourly windspeed data using a preliminary estimation method and find that the fitted transformed-linear models produce better estimates of upper tail quantities than traditional ARMA models or than classical linear regularly-varying models. The innovations algorithm is a classical recursive algorithm used in time series analysis. We develop an analogous transformed-linear innovations algorithm for our time series models that allows us to perform prediction which is fundamental to any time series analysis. The transformed-linear innovations algorithm also enables us to estimate parameters of the transformed-linear regularly-varying moving average models, thus providing a tool for modeling. We construct an inner product space of transformed-linear combinations of nonnegative regularly-varying random variables and prove its link to a Hilbert space which allows us to employ the projection theorem. We develop the transformed-linear innovations algorithm using the properties of the projection theorem. Turning our attention to the class of MA(∞) models, we talk about estimation and also show that this class of models is dense in the class of possible TPDFs. We also develop an extremes analogue of the classical Wold decomposition. Simulation study shows that our class of models provides adequate models for the GARCH and another model outside our class of models. The transformed-linear innovations algorithm gives us the best prediction and we also develop prediction intervals based on the geometry of regular variation. Simulation study shows that we obtain good coverage rates for prediction errors. We perform modeling and prediction for the hourly windspeed data by applying the innovations algorithm to the estimated TPDF.