Trustable Deep Reinforcement Learning with Efficient Data Utilization
- Author(s): Mahmoodzadeh Poornaki, Zahra
- Advisor(s): Mosleh, Ali
- et al.
We live in the era of big data in which the advancement of sensor and monitoring technologies, data storage and management, and computer processing power enable us to acquire, store and process over 2.5 Quintilian bytes of data daily. This massive data brings the necessity of using trustable and high-performance data-driven models that extract knowledge out of data. This dissertation focuses on learning to solve highly risk-averse and complex sequential decision-making problems from retrospective data sets by deep Reinforcement Learning (RL).
Deep RL has gained remarkable breakthroughs in many applications. It achieved superhuman performance in video and Atari games, defeated the world champion in game of Go, gained competent autonomy in simulated self-driving cars, and successfully learned to perform some robotic tasks. Despite all the notable advancements in deep RL, its application to real-world problems such as clinical treatment policy or industrial asset maintenance management is insignificant. Studies are underway to investigate deep RL use in realistic problems; however, none has been deployed in real-world settings. Several limitations hinder the deep RL application to real-world problems, among which trustability and excessive thirst for data are the main issues. This research is an effort to smooth the way of applying deep RL to real-world problems by addressing the above two limitations.
We first provide a concrete definition for trust in RL algorithms, and then we propose a sample efficient deep RL agent that computes a trustable solution to real-world sequential decision-making problems. The agent tackles the trust problem from two aspects. It imposes risk barriers to the RL agent's policy improvement process and provides off-policy performance estimation with a confidence bound prior to putting the agent in interaction with the actual system or environment. We address the RL significant demand for data by implementing the most advanced efficient data utilization techniques as well as deploying new techniques that improve the Trustable Deep RL sample efficiency.
The proposed methodology is tested and evaluated on a novel pipeline corrosion maintenance test bench that mimics the real system restrictions. The results witness that the Trustable Deep RL algorithm efficiently digests a retrospective data set from the pipeline environment and gains a superior and trustable interaction policy.