Integration of partially observable Markov decision processes and reinforcement learning for simulated robot navigation

Pyeatt, Larry D., author; Howe, Adele E., advisor

Integration of partially observable Markov decision processes and reinforcement learning for simulated robot navigation

Files

ETDF_PQ_1999_9947922.pdf (10.67 MB)

Date

1999

Authors

Pyeatt, Larry D., author

Howe, Adele E., advisor

Abstract

This dissertation presents a two level architecture for goal-directed robot control. The low level actions are learned on-line as the robot performs its tasks, thereby reducing the need for the system designer to program for every possible contingency. The actions are adaptive to failures in sensors and effectors, allowing the robot to perform its assigned tasks despite hardware failure. Reactivity, deliberation, and learning are an integral part of the architecture. The architecture uses a partially observable Markov decision process (POMDP) model for planning, and reinforcement learning (RL) for low level actions. In addition to the robot architecture, this dissertation presents and evaluates a new parallel POMDP solution algorithm and a new algorithm for using decision trees to perform function approximation in RL. New low level actions may be instantiated with no knowledge of what state transition they are supposed to accomplish. The patterns of reward and punishment cause them to each learn to perform their assigned state transitions. In the event of sensor or effector failure, the low level actions adapt so as to maximize reward even with reduced sensor information or effector availability. Experiments are conducted in a simulated maze-like environment to compare different versions of the architecture. In the first experiment, hand coded actions are used. The remaining experiments compare the performance of the system using hand coded actions to the performance of the system using learned actions. A final experiment demonstrates that the system can learn a new action that was not pre-specified by the system designer. The experiments demonstrate that the combination of POMDP planning and reinforcement learning provides a very reactive system that can also achieve long term goals, adapt to failures, and learn new low level actions. In order to demonstrate the robot control architecture, it was necessary to improve or modify existing approaches to reinforcement learning and POMDP planning. The approach to learning low level actions is different from any previous approach, and the experimental results indicate it performs well in the simulated maze-like environment.

Subject

computer science

URI

https://hdl.handle.net/10217/243975
https://doi.org/10.25675/3.026641

Collections

1980-1999
Theses and Dissertations

Full item page

Integration of partially observable Markov decision processes and reinforcement learning for simulated robot navigation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Abstract

Description

Rights Access

Subject

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By