Chauvin, Yves

A Connectionist Architecture for Sequential Decision Learning

1992

Chauvin, Yves

Abstract

a connectionist architecture and learning algorithm for sequential decision learning are presented. The architecture provides representations for probabilities and utilities. The learning algorithm provides a mechanism to learn from longterm rewards/utilities while observing information available locally in time. The mechanism is based on gradient ascent on the current estimate of the long-term reward in the weight spju^e defined by a "policy" network. The learning principle can be seen as a generalization of previous methods proposed to implement "policy iteration" mechanisms with connectionist networks. The algorithm is simulated for an "agent" moving in an environment described as a simple one-dimensional random walk. Results show the agent discovers optimal moving strategies in simple caises and learns how to avoid short-term suboptimal rewards in order to maximize long-term rewards in more complex cases.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

A Connectionist Architecture for Sequential Decision Learning