From sim-to-real: learning and deploying autonomous vehicle controllers that improve transportation metrics
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

From sim-to-real: learning and deploying autonomous vehicle controllers that improve transportation metrics

Abstract

The recent wide availability of semi-autonomous vehicles with distance and lane keep capabilities have created an exciting opportunity to improve the throughput and energy efficiency of the highway by deploying modified control strategies. However, even at current penetration rates, the optimal mechanism for the design of these decentralized, cooperative strategies is an open problem. In this work, we use Multi-Agent Reinforcement Learning (MARL) to investigate, design, and deploy cooperative autonomous vehicles (CAVs) to achieve these goals and demonstrate a field deployment of an RL-based traffic smoothing controller. We focus on multi-agent reinforcement learning as a mechanism for handling the complexity and non-linearity of large-scale traffic. We start by constructing a standardized suite of benchmark tasks for evaluating the efficacy of learning algorithms in designing controllers for CAVs; we evaluate these algorithms in the centralized setting where all CAVs are actuated by a single controller. We then extend one of these benchmarks, regulation of the inflow to a bottleneck via decentralized CAVs, to the multi-agent setting. We demonstrate that from both low to high penetration rates, CAVs are capable of improving the throughput of a scaled model of the San Francisco-Oakland Bay Bridge and investigate challenges in scaling our methods in open-network settings where vehicles can enter and exit the system. In preparation for a road test intended to demonstrate stop-and-go wave smoothing on large scale networks, we next study energy optimization of a full-scale model of a section of the I-210 in Los Angeles. Using Proximal Policy Optimization with an augmented value function we demonstrate that we are able to sharply improve the miles-per-gallon of the system and that the resultant controller is robust to likely variations of the system such as system speed and CAV penetration rate. However, we observe that the resultant waves are very unrealistic and additional calibration using higher resolution data is needed. With the goal of designing a more calibrated simulator, we pursue two approaches: one approach focuses on designing new driver models using available data-sets from Waymo and another approach focused on the use of collected data from the field deployment site. In the first approach, we design a new simulator that 1) efficiently represents the partially observable view-cone of human drivers and investigate whether learning safe driving policies in the simulator yields human-like behavior 2) serves as a challenging MARL benchmark. We observe promising signs of human-similarity from agents trained in the simulator. In the more direct approach, we collect data from the deployment site and use it to design a new, simplified simulator capable of using the collected data while maintaining a high simulation speed. We design energy-improving CAVs in this simulator and demonstrate that these CAVs can be successfully and safely used in a field deployment test.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View