Theses and Dissertations
Permanent URI for this collection
Browse
Browsing Theses and Dissertations by Author "Breidt, Jay, committee member"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Linear prediction and partial tail correlation for extremes(Colorado State University. Libraries, 2022) Lee, Jeongjin, author; Cooley, Daniel, advisor; Kokoszka, Piotr, committee member; Breidt, Jay, committee member; Pezeshki, Ali, committee memberThis dissertation consists of three main studies for extreme value analyses: linear prediction for extremes, uncertainty quantification for predictions, and investigating conditional relationships between variables at their extreme levels. We employ multivariate regular variation to provide a framework for modeling dependence in the upper tail, which is assumed to be a direction of interest. Cooley and Thibaud [2019] consider transformed-linear operations to define a vector space on the nonnegative orthant and show regular variation is preserved by these transformed-linear operations. Extending the approach of Cooley and Thibaud [2019], we first consider the problem of performing prediction when observed values are at extreme levels. This linear approach is motivated by the limitation that traditional extreme value models have difficulties fitting a high dimensional extreme value model. We construct an inner product space of nonnegative random variables from transformed-linear combinations of independent regularly varying random variables. Rather than fully characterizing extremal dependence in high dimensions, we summarize tail behavior via a matrix of pairwise tail dependencies. The projection theorem yields the optimal transformed-linear predictor, which has a similar form to the best linear unbiased predictor in non-extreme prediction. We then quantify uncertainty for the prediction of extremes by using information contained in the tail pairwise dependence matrix. We create the 95% prediction interval based on the geometry of regular variation. We show that the prediction intervals have good coverage in a simulation study as well as in two applications: prediction of high NO2 air pollution levels, and prediction of large financial losses. We also compare prediction intervals with a linear approach to ones with a parametric approach. Lastly, we develop the novel notion of partial tail correlation via projection theorem in the inner product space. Partial tail correlations are the analogue of partial correlations in non-extreme statistics but focus on extremal dependence. Partial tail correlation can be represented by the inner product of prediction errors associated with the previously defined best transformed-linear prediction for extremes. We find a connection between the partial tail correlation and the inverse matrix of tail pairwise dependencies. We then develop a hypothesis test for zero elements in the inverse extremal matrix. We apply the idea of partial tail correlation to assess flood risk in application to extreme river discharges in the upper Danube River basin. We compare the extremal graph constructed from the idea of the partial tail correlation to physical flow connections on the Danube.Item Open Access Non-asymptotic properties of spectral decomposition of large gram-type matrices with applications to high-dimensional inference(Colorado State University. Libraries, 2020) Zhang, Lyuou, author; Zhou, Wen, advisor; Wang, Haonan, advisor; Breidt, Jay, committee member; Meyer, Mary, committee member; Yang, Liuqing, committee memberJointly modeling a large and possibly divergent number of temporally evolving subjects arises ubiquitously in statistics, econometrics, finance, biology, and environmental sciences. To circumvent the challenges due to the high dimesionality as well as the temporal and/or contemporaneous dependence, the factor model and its variants have been widely employed. In general, they model the large scale temporally dependent data using some low dimensional structures that capture variations shared across dimensions. In this dissertation, we investigate the non-asymptotic properties of spectral decomposition of high-dimensional Gram-type matrices based on factor models. Specifically, we derive the exponential tail bound for the first and second moments of the deviation between the empirical and population eigenvectors to the right Gram matrix as well as the Berry-Esseen type bound to characterize the Gaussian approximation of these deviations. We also obtain the non-asymptotic tail bound of the ratio between eigenvalues of the left Gram matrix, namely the sample covariance matrix, and their population counterparts regardless of the size of the data matrix. The documented non-asymptotic properties are further demonstrated in a suite of applications, including the non-asymptotic characterization of the estimated number of latent factors in factor models and related machine learning problems, the estimation and forecasting of high-dimensional time series, the spectral properties of large sample covariance matrix such as perturbation bounds and inference on the spectral projectors, and low-rank matrix denoising from temporally dependent data. Next, we consider the estimation and inference of a flexible subject-specific heteroskedasticity model for large scale panel data, which employs latent semiparametric factor structure to simultaneously account for the heteroskedasticity across subjects and contemporaneous and/or serial correlations. Specifically, the subject-specific heteroskedasticity is modeled by the product of unobserved factor process and subject-specific covariate effect. Serving as the loading, the covariate effect is further modeled via additive models. We propose a two-step procedure for estimation. Theoretical validity of this procedure is documented. By scrupulously examining the non-asymptotic rates for recovering the latent factor process and its loading, we show the consistency and asymptotic efficiency of our regression coefficient estimator in addition to the asymptotic normality. This leads to a more efficient confidence set for the regression coefficient. Using a comprehensive simulation study, we demonstrate the finite sample performance of our procedure, and numerical results corroborate the theoretical findings. Finally, we consider the factor model-assisted variable clustering for temporally dependent data. The population level clusters are characterized by the latent factors of the model. We combine the approximate factor model with population level clusters to give an integrative group factor model as a background model for variable clustering. In this model, variables are loaded on latent factors and the factors are the same for variables from a common cluster and are different for variables from different groups. The commonality among clusters is modeled by common factors and the clustering structure is modeled by unique factors of each cluster. We quantify the difficulty of clustering data generated from integrative group factor model in terms of a permutation-invariant clustering error. We develop an algorithm to recover clustering assignments and study its minimax-optimality. The analysis of integrative group factor model and our proposed algorithm partitions a two-dimensional phase space into three regions showing the impact of parameters on the possibility of clustering in integrative group factor model and the statistical guarantee of our proposed algorithm. We also obtain the non-asymptotic characterization of the estimated number of latent factors. The model can be extended to the case of diverging number of clusters with similar results.Item Open Access Outlier discordancy tests based on saddlepoint approximations(Colorado State University. Libraries, 2019) Sleeper, Andrew D., author; Scharf, Louis, advisor; Boes, Duane, committee member; Breidt, Jay, committee member; Jayasumana, Anura, committee memberWhen testing for the discordancy of a single observed value, a test based on large values of the maximum absolute studentized residual (MASR) or maximum squared studentized residual (MSSR) is known to be optimal, by maximizing the probability of correctly identifying an outlying value, while controlling the risk of a false identification to α. The exact distribution of MASR and MSSR is not known. In its place, the first Bonferroni bound on the distribution of these statistics is commonly used as an outlier test; see Grubbs (1950). We present new approximations to the distribution of MASR or MSSR, based on saddlepoint approximations of the density of statistics calculated from truncated normal random variables. These approximations are developed in three settings: a one-sample case, univariate regression, and multivariate regression. In comparisons with three versions of Bonferroni bounds and a Monte Carlo simulation, the saddlepoint approximations are shown to perform well in a wide range of situations, especially at larger sample size. The saddlepoint approximations also calculate faster than the improved versions of Bonferroni bounds.Item Open Access Some topics in high-dimensional robust inference and graphical modeling(Colorado State University. Libraries, 2021) Song, Youngseok, author; Zhou, Wen, advisor; Breidt, Jay, committee member; Cooley, Dan, committee member; Hoke, Kim, committee memberIn this dissertation, we focus on large-scale robust inference and high-dimensional graphical modeling. Especially, we study three problems: a large-scale inference method by a tail-robust regression, model specification tests for dependence structure of Gaussian Markov random fields, and a robust Gaussian graph estimation. First of all, we consider the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The new challenge that comes along with the overwhelmingly large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of conventional least squares based methods. The new testing procedure is built on data-adaptive Huber regression, and a new covariance estimator of the regression estimate. Under mild conditions, we show that the proposed methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments, along with an empirical study on quantitative linguistics, demonstrate the advantage of our proposal compared to many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions. In the next chapter, we focus on the Gaussian Markov random fields (GMRFs) and, by utilizing the connection between GMRFs and precision matrices, we propose an easily implemented procedure to assess the spatial structures modeled by GMRFs based on spatio-temporal observations. The new procedure is flexible to assess a variety of structures including the isotropic and directional dependence as well as the Matern class. A comprehensive simulation study has been conducted to demonstrate the finite sample performance of the procedure. Motivated from the efforts on modeling flu spread across the United States, we also apply our method to the Google Flu Trend data and report some very interesting epidemiological findings. Finally, we propose a high-dimensional precision matrix estimation method via nodewise distributionally robust regressions. The distributionally robust regression with an ambiguity set defined by Wasserstein-2 ball has a computationally tractable dual formulation, which is linked to square-root regressions. We propose an iterative algorithm that has a substantial advantage in terms of computation time. Extensive numerical experiments study the performance of the proposed method under various precision matrix structures and contamination models.Item Open Access Statistical upscaling of stochastic forcing in multiscale, multiphysics modeling(Colorado State University. Libraries, 2019) Vollmer, Charles T., author; Estep, Don, advisor; Tavener, Simon, committee member; Breidt, Jay, committee member; Cooley, Dan, committee memberModeling nuclear radiation damage necessarily involves multiple scales in both time and space, where molecular-level models have drastically different assumptions and phenomena than continuumlevel models. In this thesis, we propose a novel approach to explicitly coupling these multiple scales of the microstructure damage of radiation in materials. Our proposed stochastic process is a statistical upscaling from physical first principals that explicitly couples the micro, meso, and macro scales of materials under irradiation.