Repository logo

Understanding extreme behavior by optimizing tail dependence with application to ground level ozone via data mining and spatial modeling




Russell, Brook T., author
Cooley, Daniel S., advisor
Hoeting, Jennifer, committee member
Wang, Haonan, committee member
Schumacher, Russ, committee member

Journal Title

Journal ISSN

Volume Title


This dissertation presents novel work in statistical methods for extremes. Our underlying modeling procedure identifies the linear combination of covariates that is associated with extreme values of a response variable, and is based on the framework of bivariate regular variation. We propose a data mining strategy that is suitable for an analysis of ground level ozone, and spatially model the primary drivers of extreme ozone over a large study region. In this dissertation, we first review statistical methods for univariate and multivariate extremes. We then discuss tail dependence parameters and their estimators and introduce γ, a tail dependence metric which is better suited for optimization than other existing metrics. We also introduce the idea of tail dependence estimators that utilize a smooth threshold rather than the 'hard' threshold common to extremes. A smooth threshold is necessary to perform optimization, which has not previously been considered in extremes studies. We also show consistency of estimators with smooth thresholds. Subsequently, we outline our procedure for optimizing tail dependence and discuss parameter estimation. We also propose a model selection procedure that is based on cross-validation. Then we give a simulation study where we demonstrate our method's ability to detect complicated conditions which lead to extreme behavior and compare our approach to competing methods. Next, we propose a data mining procedure that can be used to find the set of covariates that produces the linear combination that has the highest degree of tail dependence with a response variable. Our data mining procedure is a model selection exercise where the model space is too large to be searched exhaustively. We use an automated model search procedure based on simulated annealing. We also give an analysis of ground level ozone, applying our data mining procedure to data from Atlanta, Georgia and Charlotte, North Carolina. We discuss how our method can be modified to deal with non-continuous covariates such as precipitation. Lastly, we seek to model how a set of primary drivers varies spatially over a study region. We utilize data from 160 EPA stations in 13 US states plus the District of Columbia. We model the parameters in our extreme value procedure spatially using a hierarchical modeling technique. For inference, we utilize a two-step procedure.


Rights Access


multivariate regular variation
cross validation
tail dependence


Associated Publications