From driverless vehicles to mars rovers, autonomous navigation and task-taking are undoubtedlythe future of robotics. Over the recent years, research in Deep Reinforcement Learning (DRL) has
grown in many areas of navigation including unmanned aerial vehicles (UAVs). Most of them are
far from realistic and assume perfect state observations. In many real-life scenarios like search and
rescue in a complex and cluttered environment, GPS-denial tends to become a problem. Therefore,
this research is interested in vision-based navigation and obstacle avoidance in realistic environments.
More specifically, this thesis aims to address the following research tasks: 1) To investigatethe vision-based navigation of UAV in GPS-denied synthetic environments. This work will utilize
a Variational Autoencoder (VAE) to improve sample efficiency, and develop a Proximal Policy
Optimization (PPO) agent that can trace rivers in photo-realistic simulations. 2) To conduct
reward shaping to deal with vision-based problems. Developing the correct reward function leads
to the desired agent behavior, but it is a challenging task in vision-based learning. 3) To validate
the PPO agent performance and compare it with another agent trained with imitation learning
(IL). The evaluation metrics include the average distance traveled per episode, distance away from
center of river, and standard deviation of actions taken.