Robotic grasping in a complex environment is one of the fundamental challenges for home-assistant robots. Complex environment grasping has been extensively studied in industrial bin-picking scenarios, where reliably grasping objects from unorganized heaps is challenging due to sensor noise, obstructions, and occlusions. However, bin picking is still relatively easier than grasping common household objects from a structured clutter in a home environment because the robot cannot knock over neighboring objects during the grasping motion. Recently, there have been several attempts to tackle the grasping-in-structured-clutter problem. In our experiments, we found these methods either hard to adapt to our simulated environment without extra tuning or generate too few stable grasps to successfully grasp the objects. The overviews and detailed analyses of these existing grasping approaches will appear in the first half of this thesis.
In the second half of this thesis, we investigate the idea of using a physical simulator as an intermediate step to generate a grasp trajectory proposal. At a high level, we propose a two-step approach to solve the grasping-in-structured-clutter problem. First, we collect RGB-D observations to reconstruct the environment in a physical simulator via 9 degree-of-freedom (DoF) category-level object pose estimation, CAD model matching, and physical support refinement. Then, we perform antipodal grasp sampling, collision-free motion planning, and grasp execution in the simulator and directly transfer the robot arm’s motion trajectory to the original environment.
To generate a 9-DoF category-level object pose estimate, we extend a state-of-the-art 6-DoF instance-level object pose estimation network. In our experiments, we found the 9-DoF pose estimation network can reach performance comparable to the state-of-the-art on a category-level object pose estimation dataset. Relying on only the top-down view of the environment, we reconstructed the environment using the proposed two-step approach and evaluated the grasp transfer success. The results show further room for improvements in the model matching process. Future directions and some ideas will be discussed towards the end of this thesis.
We hope the work of scene reconstruction for simulated grasp search and trajectory transfer will help future research of robotics manipulation in complex environments.