Some topics in high-dimensional robust inference and graphical modeling
Date
2021
Authors
Song, Youngseok, author
Zhou, Wen, advisor
Breidt, Jay, committee member
Cooley, Dan, committee member
Hoke, Kim, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
In this dissertation, we focus on large-scale robust inference and high-dimensional graphical modeling. Especially, we study three problems: a large-scale inference method by a tail-robust regression, model specification tests for dependence structure of Gaussian Markov random fields, and a robust Gaussian graph estimation. First of all, we consider the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The new challenge that comes along with the overwhelmingly large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of conventional least squares based methods. The new testing procedure is built on data-adaptive Huber regression, and a new covariance estimator of the regression estimate. Under mild conditions, we show that the proposed methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments, along with an empirical study on quantitative linguistics, demonstrate the advantage of our proposal compared to many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions. In the next chapter, we focus on the Gaussian Markov random fields (GMRFs) and, by utilizing the connection between GMRFs and precision matrices, we propose an easily implemented procedure to assess the spatial structures modeled by GMRFs based on spatio-temporal observations. The new procedure is flexible to assess a variety of structures including the isotropic and directional dependence as well as the Matern class. A comprehensive simulation study has been conducted to demonstrate the finite sample performance of the procedure. Motivated from the efforts on modeling flu spread across the United States, we also apply our method to the Google Flu Trend data and report some very interesting epidemiological findings. Finally, we propose a high-dimensional precision matrix estimation method via nodewise distributionally robust regressions. The distributionally robust regression with an ambiguity set defined by Wasserstein-2 ball has a computationally tractable dual formulation, which is linked to square-root regressions. We propose an iterative algorithm that has a substantial advantage in terms of computation time. Extensive numerical experiments study the performance of the proposed method under various precision matrix structures and contamination models.