Digital humans play the leading roles in every aspect of the virtual world. In recent years, multi-view reconstruction is emerging as a cutting-edge technique to generate human models. It reveals the potential to restore intricate surfaces for a wide range of applications requiring more than minimalist avatars. Even with many promising results, the related systems are still cumbersome with many requirements on devices and the environment. The capture often has to be performed in a large-scale studio with dozens of HD camera rigs in order to preserve the surface details as well as mitigate artifacts introduced by occlusion.
Our research aims to simplify the multi-view system and bring it to wider application scenarios. This goal requires a significant upgrade to the state-of-the-art algorithms in order to address gap-region surface and large-displacement registration challenges introduced by reducing input coverage and curbing computing capability. We tackle these challenges by leveraging learning-based prior knowledge on 3D human geometries as well as a more robust deformation framework.
Firstly, we explore learning-based 3D reasoning methods. The current end-to-end framework shows the capability to generate arbitrary 3D shapes with even monocular inputs. We propose a novel back-propagation method for differentiable rasterizer that improves the accuracy and decreases the complexity significantly.
With the foundation of reliable surface inference, in the next step, we investigate completing the gap-region surface. We realize this purpose in an innovative way that transfers the time-consuming 3D procedures into a simple 2D image-domain processing. It can work compatibly with the traditional multi-view depth fusion method.
After that, given a complete key model, we work on fusing the surface information from other frames in the sequence to further enhance the details. Based on the state-of-the-art method, we boost the robustness of the embedded deformation in the large-displacement cases when the models have the significant pose and appearance differences. We achieve this with a novel adaptive regularization and an efficient two-step optimization.
Finally, we design a new framework specifically for human head reconstruction. Instead of sculpturing the complicated geometries of faces, we focus on generating HD textures and fusing input sequences with a geometry-aware texture stitching framework. The results preserve high resolutions details and can be executed in real-time.