Title: Dataset associated with "Forecasting Excessive Rainfall with Random Forests and a Deterministic Convection-Allowing Model" Abstract: Approximately seven years of daily initializations from the convection-allowing National Severe Storms Laboratory Weather Research and Forecasting model are used as inputs to train random forest (RF) machine learning models to probabilistically predict instances of excessive rainfall. Unlike other hazards, excessive rainfall does not have an accepted definition, so multiple definitions of excessive rainfall and flash flooding – including flash flood reports and 24-hr average recurrence intervals (ARIs) – are used to explore RF configuration forecast sensitivities. RF forecasts are analogous to operational Weather Prediction Center (WPC) day-1 Excessive Rainfall Outlooks (EROs) and their resolution, reliability, and skill are strongly influenced by rainfall definitions and how inputs are assembled for training. Models trained with 1-y ARI exceedances defined by the Stage-IV (ST4) precipitation analysis perform poorly in the northern Great Plains and southwest U.S., in part due to a high bias in the number of training events in these regions. Increasing the ARI threshold to 2 years or removing ST4 data from training, optimizing forecast skill geographically, and spatially averaging meteorological inputs for training generally results in improved CONUS-wide RF forecast skill. Both EROs and RF forecasts have seasonal skill – poor forecasts in the late fall and winter and skillful forecasts in the summer and early fall. However, the EROs are consistently and significantly better than their RF counterparts, regardless of RF configuration, particularly in the summer months. The results suggest careful consideration should be made when developing ML-based probabilistic precipitation forecasts with convection-allowing model inputs, and further development is necessary to consider these forecast products for operational implementation. Contact: Aaron Hill, aaron.hill@colostate.edu License information: The material is open access and distributed under the terms and conditions of the Creative Commons Public Domain "No rights reserved" (https://creativecommons.org/share-your-work/public-domain/cc0/). Recommended data citation: Hill, A. Dataset associated with "Forecasting Excessive Rainfall with Random Forests and a Deterministic Convection-Allowing Model." Colorado State University. Libraries. http://dx.doi.org/10.25675/10217/233672 Associated article: Hill, A. J. and R. S. Schumacher, 2021: Forecasting excessive rainfall with random forests and a deterministic convection-allowing model. Weather and Forecasting, 36, 1693-1711, https://doi.org/10.1175/WAF-D-21-0026.1 Format of data files: netCDF Temporal coverage: 2017-01-01 -- 2018-12-31 File information: This repository contains ten netCDF files of aggregated forecasts from machine learning models detailed in Hill and Schumacher (2021); see associated publication. The netCDF4 files contain near-daily forecasts across the continental United States defined on a specific, Weather Prediction Center grid. Forecasts are indexed via date and (lat,lon) coordinates. The files were created using the xarray package in Python.