﻿TITLE: Dataset associated with "Estimation of the State-Value Function for Optimal Reservoir Operations using Continuous Action Deep Reinforcement Learning"

ABSTRACT: The state-value function of a reservoir system provides information about the long-term rewards that can be accrued from any state which the system can occupy. This function can be used to determine optimal decisions and is also key piece of information needed when reservoir operators wish to incorporate real-time forecast information. Dynamic programming is the most popular method for calculating the state-value function but has well-known limitations. The “curse of dimensionality,'' which can lead to computational intractability, arises from the discrete nature of the formulation and the backwards recursive solution process precluding consideration of delayed rewards. Continuous action deep reinforcement learning (CADRL) is a recent development for estimating the state-value function when delayed rewards are present and avoids the difficulties associated with use of discrete methods. Since application of this technique to reservoir operation problems is not without its own challenges, presented herein is a computational implementation with refinements needed to provide a stable and reliable learning process . CADRL is applied to development of optimal operational strategies for Lake Mendocino in the Russian River basin of Northern California using two single-objective reward functions, along with a multi-objective reward function for verification purposes. Performance of the optimal policy functions developed from the learning process is evaluated through simulation, with results showing that the system is able to learn far-sighted strategies that outperform idealized policies with foresight.

CONTACT: Matthew Peacock, mpeacock86@gmail.com

REFERENCES: 
The input datasets were obtained from the U.S. Army Corp of Engineers Hydrologic Engineering Center's (USACE-HEC) HEC-ResSim model of the Upper Russian River (Version 3.1), which is public material. https://www.hec.usace.army.mil/software/hec-ressim/

The inspiration for the basic structure of the implementation of the DDPG algorithm came from the code examples found at:
Deep Deterministic Policy Gradients in Tensorflow
by Patrick Emami https://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html

LICENSE: The implementation of the DDPG specfic to the reservoir operation in the enclosed code is licensed under GNU General Public License version 3 as published by the Free Software Foundation.  A copy of the GNU General Public License is contained in COPYING.txt

FORMAT: Datafiles are in .csv format.  Code is in .py format

The input data files animals.csv and franimals.csv are simply lists of animal names that are used in automated naming of output files simply to make them easier to read and distinguish.  The LakeMendoEvapData.csv file was obtained from the HEC-ResSim model of the Upper Russian River.  The data in InputData.csv was also obtained from the HEC-ResSim model of the Upper Russian River.

The time period spanned by the input data begins on 1950-01-01 and ends on 2010-12-31.

This data set contains 14 files including this README.txt file.  The files other than the README file are organized into two folders: SourceCode and SourceData.  The SourceData contains four .csv files.  These are:
1. animals.csv - names of animals (for use in output file names)
2. franimals.csv - names of animals in French (there were a lot of output files)
3. InputData.csv - time series data for inputs to the network at 10 nodes, and the withdrawal from the network at 5 nodes.
4. LakeMendoEvapData.csv - evaporation table for Lake Mendocino

The SourceCode folder contains 10 files:
1. dataprocessors.py - functions and classes for processing output
2. ddpg.py - contains the definition of the DDPG class, an object that is used for managing the learning process and owns the actor, critic and environment objects
3. ddpg_agent.py - defines the ActorNetwork, CriticNetwork classes
4. ddpg_env.py - defines the DDPGEnv class
5. main.py - defines a main function which can be executed from the command line prompt to perform the reinforcement learning process
6. main_sim.py - defines a main function that will run the simulation process which uses the state-value functions obtained through the DDPG reinforcement learning process along with the ensemble forecasts to operate the reservoir
7. main_ver.py - a simple function that runs a verification simulation to attempt to verify the reliability values from the state-value functions
8. params.txt - a text file that can be read into python to import most of the frequently modified parameters, used also as a record of a model run
9. perfectforecasts.py - a script to take the values from the input data set and put them into the same form as the ensemble forecasts to create a set of 'perfect' forecasts for comparison
10. replay_buffer.py - defines the ReplayBuffer, ReplayBufferOneStep, ReplayBufferNStep, and ReplayBufferNStepBiasCorrection classes.  These were built off of the ReplayBuffer class by Patrick Emami (see above) and were extended to allow for n-step rewards as well as bias correction

The column headings of the input data file correspond to the names assigned to each input in the HEC-ResSim model.  This same name is assigned to the 'name' attribute for the corresponding node in the environment object.

