Skip to main content
eScholarship
Open Access Publications from the University of California

A Connectionist Architecture for Sequential Decision Learning

Abstract

a connectionist architecture and learning algorithm for sequential decision learning are presented. The architecture provides representations for probabilities and utilities. The learning algorithm provides a mechanism to learn from longterm rewards/utilities while observing information available locally in time. The mechanism is based on gradient ascent on the current estimate of the long-term reward in the weight spju^e defined by a "policy" network. The learning principle can be seen as a generalization of previous methods proposed to implement "policy iteration" mechanisms with connectionist networks. The algorithm is simulated for an "agent" moving in an environment described as a simple one-dimensional random walk. Results show the agent discovers optimal moving strategies in simple caises and learns how to avoid short-term suboptimal rewards in order to maximize long-term rewards in more complex cases.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View