A new post-processing paradigm? Improving high-impact weather forecasts with machine learning
Date
2018
Authors
Herman, Gregory Reid, author
Schumacher, Russ S., advisor
Barnes, Elizabeth A., committee member
van den Heever, Susan C., committee member
Cooley, Daniel S., committee member
Hamill, Thomas M., committee member
Journal Title
Journal ISSN
Volume Title
Abstract
High-impact weather comes in many different shapes, sizes, environments, and storm types, but all pose threats to human life, property, and the economy. Because of the significant societal hazards inflicted by these events, having skillful forecasts of the risks with sufficient lead time to make appropriate precautions is critical. In order to occur, these extreme events require a special conglomeration of unusual meteorological conditions. Consequently, effective forecasting of such events often requires different perspectives and tools than routine forecasts. A number of other factors make advance forecasts of rare, high-impact weather events particularly challenging, including the lack of sufficient resolution to adequately simulate the phenomena dynamically in a forecast model; model biases in representing storms, and which often become increasingly pronounced in extreme scenarios; and even difficulty in defining and verifying the high-impact event. This dissertation systematically addresses these recurring challenges for several types of high-impact weather: flash flooding and extreme rainfall, tornadoes, severe hail, and strong convective winds. For each listed phenomenon, research to more concretely define the current state of the science in analyzing, verifying, and forecasting the phenomenon. From there, in order to address the aforementioned persistent limitations with forecasting extreme weather events, machine learning-based post-processing models are developed to generate skillful, calibrated probabilistic forecasts for high-impact weather risk across the United States. Flash flooding is a notoriously challenging forecast problem. But the challenge is rooted even more fundamentally with difficulties in assessing and verifying flash flooding from observations due to the complex combination of hydrometeorological factors affecting flash flood occurrence and intensity. The first study in this dissertation investigates the multi-faceted flash flood analysis problem from a simplified framework considering only quantitative precipitation estimates (QPEs) to assess flash flood risk. Many different QPE-to-flash flood potential frameworks and QPE sources are considered over a multi-year evaluation period and QPE exceedances are compared against flash flood observations and warnings. No conclusive "best" flash flood analysis framework is clearly identified, though specific strengths and weaknesses of different approaches and QPE sources are identified in addition to regional differences in optimal correspondence with observations. The next two-part study accompanies the flash flood analysis investigation by approaching forecasting challenges associated with extreme precipitation. In particular, more than a decade of forecasts from a convection-parameterized global ensemble, the National Oceanic and Atmospheric Administration's Second Generation Global Ensemble Forecast System Reforecast (GEFS/R) model, are used to develop machine learning (ML) models for probabilistic prediction of extreme rainfall across the conterminous United States (CONUS) at Days 2 and 3. Both random forests (RFs) and logistic regression models (LR) are developed, with separate models trained for each lead time and for eight different CONUS regions. Models use the spatiotemporal evolution of a host of different atmospheric fields as predictors in addition to select geographic and climatological predictors. The models are evaluated over four years of withheld forecasts. The models, and particularly the RFs, are found to compare very favorably with both raw GEFS/R ensemble forecasts and those from a superior global ensemble produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) both in terms of forecast skill and reliability. The trained models are also inspected to discern what statistical findings are identified through ML. Many of the findings quantify anecdotal knowledge that is already recognized regarding the forecast problem, such as the relative skill of simulated precipitation in areas where extreme precipitation events are associated with large-scale processes well resolved by the GEFS/R compared with areas where extreme precipitation predominantly occurs in association with convection in the warm-season. But more subtle spatiotemporal biases are also diagnosed, including a northern displacement bias in the placement of convective systems and a southern displacement bias in placing landfalling atmospheric rivers. The final extended study shifts weather phenomenon focus from extreme rainfall to severe weather: tornadoes, large hail, and severe convective winds. While both high-impact, the two classes of weather hazards share some commonalities and contrasts. While rainfall is directly forecast by dynamical weather models, most severe weather occurs on too small of spatial scales to be directly simulated by the same models. Consequently, unlike with extreme precipitation, when developing post-processed severe weather forecasts, there is no obvious benchmark for objectively determining whether and how much improvement the post-processing is yielding. A natural alternative, albeit much more stringent, benchmark is operational forecasts produced by human forecasters. Operational severe weather forecasts are produced by the Storm Prediction Center (SPC), but there is limited published verification of their outlooks quantifying their probabilistic skill. In the first part of this study, an extended record SPC severe weather outlooks were evaluated to quantitatively assess the state of operational severe weather forecasting, including strengths and weaknesses. SPC convective outlooks were found to decrease in skill with increased forecast lead time, and were most skillful for severe winds, with the worst performance for tornado outlooks. Many seasonal and regional variations were also observed, with performance generally best in the North and East and worst in the South and especially West. The second part of the study follows similar methodology to the extreme precipitation models, developing RF-based probabilistic forecast models forced from the GEFS/R for Days 1--3 across CONUS, analogous to the format in which SPC produces its convective outlooks. RF properties are inspected to investigate the statistical relationships identified between GEFS/R fields and severe weather occurrence. Like with the extreme precipitation model, RF severe weather forecasts are generated and evaluated from several years of withheld validation cases. These forecasts are compared alongside SPC outlooks and also blended with them to produce a combined forecast. Overall, by statistically quantifying relationships between the synoptic-scale environment and severe weather in a manner consistent with the community's physical understanding of the forecast problems, the RF models are able to demonstrate skill over SPC outlooks at Days 2 and 3, and can be blended with SPC outlooks to enhance skill at Day 1. Overall, multiple high-impact weather phenomena---extreme precipitation and severe weather---are investigated from verification, analysis, and forecasting standpoints. On verification and analysis, foundations have been laid both to improve existing operational products as well as better frame and contextualize future studies. ML post-processing models developed were highly successful in advancing forecast skill and reliability for these hazardous weather phenomena despite being developed from predictors of a coarse, dated dynamical model in the GEFS/R. The findings also suggest adaptability across a wide array of forecast problems, types of predictor inputs, and lead times, raising the possibility of broader applicability of these methods in operational numerical weather prediction.