Simulation plays a crucial role in modern academic study, particularly in the field of artificial intelligence (AI). The simulation environment can mimic real-world scenarios, allowing the AI agent to learn, adapt, and make decisions in a controlled and safe setting. This thesis tackles two important problems in building the next generation of artificial general intelligence (AGI): how to efficiently train an AI agent with values and how to overcome the simulation to reality gap to bring the training results to real-world applications. The current studies of AI mainly consider learning about the potential or energy function (U), referring to understanding the impact of the outside environment. The U function helps the agent apprehend the physical world laws, natural potentials, and social norms. However, taking into account the value learning, usually representing modeling one's inside thinking, benefits the agent to derive its goals, intents, and social values.
Our research shows that both U and V learning are equally important to the pathway to AGI. The learning of U is usually data-driven. It enables the agent to imitate and complete the task through statistical learning. By incorporating the value function, the agent can spontaneously specify a task plan and its behavior is more in line with human cognition and value.
This thesis consists of three parts: (1) Potential function learning, which explores the process of acquiring knowledge or skills that are useful and practical for a particular purpose. (2) Value learning when learning the potential (U) function can not satisfy all the learning goals, which investigates situations where utility-based learning approaches might be limited or ineffective. (3) Combining U and V learning, which focuses on the integration of simulation-based learning and data-driven learning methods.
We primarily focus on assessing the effectiveness of U learning within a simulated environment. Our investigation commences with agents operating in a controlled simulated setting, where the action space is intentionally kept small. Through rigorous testing and iterative refinement, we gradually expand the scope of our analysis to encompass agents dealing with increasingly complex and continuous action spaces. Upon achieving compelling results in the simulated realm, we proceed to the crucial next step: transferring the knowledge and expertise gained from the well-trained agents in the simulation space to real-world scenarios. This process entails adapting the learned policies, strategies, and decision-making capabilities of the agents to navigate the intricacies and uncertainties of genuine environments.