Causal inference using observational data - case studies in climate science
Date
2020
Authors
Samarasinghe, Savini M., author
Ebert-Uphoff, Imme, advisor
Anderson, Chuck, committee member
Chong, Edwin, committee member
Kirby, Michael, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
We are in an era where atmospheric science is data-rich in both observations (e.g., satellite/ sensor data) and model output. Our goal with causal discovery is to apply suitable data science approaches to climate data to make inferences about the cause-effect relationships between climate variables. In this research, we focus on using observational studies, an approach that does not rely on controlled experiments, to infer cause-effect. Due to reasons such as latent variables, these observational studies do not allow us to prove causal relationships. Nevertheless, they provide data-driven hypotheses of the interactions, which can enable us to get insights into the salient interactions as well as the timescales at which they occur. Even though there are many different causal inference frameworks and methods that rely on observational studies, these approaches have not found widespread use within the climate or Earth science communities. To date, the most commonly used observational approaches include lagged correlation/regression analysis, as well as the bivariate Granger causality approach. We can attribute this lack of popularity to two main reasons. First is the inherent difficulty of inferring cause-effect in climate. Complex processes in the climate interact with each other at varying time spans. These interactions can be nonlinear, the distributions of relevant climate variables can be non-Gaussian, and the processes can be chaotic. A researcher interested in these causal inference problems has to face many challenges varying from identifying suitable variables, data, preprocessing and inference methods, as well as setting up the inference problem in a physically meaningful way. Also, the limited exposure and accessibility to modern causal inference approaches is another reason for their limited use within the climate science community. In this dissertation, we present three case studies related to causal inference in climate science, namely, (1) causal relationships between the Arctic temperature and mid-latitude circulations, (2) relationships between the Madden Julian Oscillation (MJO) and the North Atlantic Oscillation (NAO) and (3) the causal relationships between atmospheric disturbances of different spatial scales (e.g., Planetary vs. Synoptic). We use methods based on probabilistic graphical models to infer cause-effect, specifically constraint-based structure learning methods, and graphical Granger methods. For each case study, we analyze and document the scientific thought process of setting up the problem, the challenges faced, and how we have dealt with the challenges. The challenges discussed include, but not limited to, method selection, variable representation, and data preparation. We also present a successful high-dimensional study of causal discovery in spectral space. The main objectives of this research are to make causal inference methods more accessible to a researcher/climate scientist who is at entry-level to spatiotemporal causality and to promote more modern causal inference methods to the climate science community. The case studies, covering a wide range of questions and challenges, are meant to act as a resourceful starting point to a researcher interested in tackling more general causal inference problems in climate.
Description
Rights Access
Subject
climate
graphical causal models
teleconnections
Granger causality
causality
Pearl causality