Model-based reinforcement learning for cooperative multi-agent planning: exploiting hierarchies, bias, and temporal sampling

2020

Abstract

Autonomous unmanned vehicles (UxVs) can be useful in many scenarios

including disaster relief, production and manufacturing, as well as

carrying out Naval missions such as surveillance, mapping of unknown

regions and pursuit of other hostile vehicles. When considering

these scenarios, one of the most difficult challenges is determining

which actions or tasks the vehicles should take in order to most

efficiently satisfy the objectives. This challenge becomes more

difficult with the inclusion of multiple vehicles, because the

action and state space scale exponentially with the number of

agents. Many planning algorithms suffer from the curse of

dimensionality as more agents are included, sampling for

suitable actions in the joint action space becomes infeasible within

a reasonable amount of time. To enable autonomy, methods that can be

applied to a variety of scenarios are invaluable because they reduce

human involvement and time.

Recently, advances in technology enable algorithms that require more

computational power to be effective but work in broader

frameworks. We offer three main approaches to multi-agent planning

which are all inspired by model-based reinforcement learning.

First, we address the curse of dimensionality and investigate how to

spatially reduce the state space of massive environments where

agents are deployed. We do this in a hierarchical fashion by

searching subspaces of the environment, called sub-environments, and

creating plans to optimally take actions in those sub-environments.

Next, we utilize game-theoretic techniques paired with simulated

annealing as an approach for agent cooperation when planning in a

finite time horizon. One problem with this approach is that agents

are capable of breaking promises with other agents right before

execution. To address this, we propose several variations that

discourage agents from changing plans in the near future and

encourages joint planning in the long term. Lastly, we propose a

tree-search algorithm that is aided by a convolutional neural

network. The convolutional neural network takes advantage of

spatial features that are natural in UxV deployment and offers

recommendations for action selection during tree search. In

addition, we propose some design features for the tree search that

target multi-agent deployment applications.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego