Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

3D Geometric Deep Learning and its Engineering Applications

Abstract

The physical world is spatially three dimensional, hence understanding the physical and semantic properties of the three dimensional world is crucial to designing engineering systems that operate in and interact with its three dimensional surroundings. Recent advances in Deep Learning has shown great promise in offering a general methodology for creating algorithms that learns to map sensory inputs to desired outcomes by training on a repository of data. Such advances have been particularly salient in fields such as computer vision, where algorithms have achieved near-human or super-human performance in a range of traditionally difficult problems such as image classification, object detection and segmentation. However very often three dimensional problems have been reduced to two dimensional ones for simplicity. In traditional computer vision, inherently three dimensional objects are instead represented by images of their projections onto the two dimensional plane. In computational physics, three dimensional simulations are often simplified as two dimensional counterparts as the computational cost of 3D simulations are often times intractable for many applications. Such simplifications leave out essential information about the actual spatial relationships between objects and can result in inaccurate or erroneous predictions.

The focus of this dissertation is to address the challenges in designing deep learning algorithms and architectures that interact with three dimensional data. Conventional data representation and neural architectures in computer vision do not directly extend to three dimensional counterparts for various reasons. First, data representation for 3D objects is varied and diverse. For instance, in computer graphics and computation physics, geometries are usually represented as simplicial complexes (point clouds, wire frames, triangular meshes, volumetric tetrahedral meshes etc.) for efficiency. In robotics, however, the data representation is mainly determined by the form of the raw sensory inputs to the system, such as point clouds resulting from Lidar scans or Kinect sensors, RGB-D images from depth-enabled cameras, multiview images from binocular or multi-camera setup of the system. Panoramic or fisheye images are also becoming increasingly prevalent in UAVs (Unmanned Aerial Vehicles) and self-driving cars. Second, three dimensional data tend to be orders of magnitudes greater that its two dimensional counter part. Using Cartesian grid representation as an example, the storage complexity for 2D images is O(n^2) whereas for 3D objects that would be O(n^3). Therefore more efficient encoding and representation methods need to be brought forward to address such challenges. Last but not least, three dimensional data is much more costly to acquire (due to much higher costs of collection, and inaccessibility of such sensors), label (orienting and navigating 3D data on 2D screens requires engineering effort), store and process. In the sections that follow, I will discuss novel methodologies that address these aforementioned challenges, and present various useful applications in computer vision and computational physics that benefit from such methodologies.

In Chapter One, I will present an in-depth overview of these challenges that arise from 3D data, as well as a detailed discussion of existing disciplines from computational physics, climate science, to computer vision that can benefit from progress in 3D Geometric Deep Learning. In Chapter Two, I will present a novel and general methodology for differentiably rasterizing unstructured geometric representations in the form of simplicial complexes. This can serve as a geometry layer within deep neural networks that allows a natural extension of Convolutional Neural Network (CNN) based architectures to a vast collection of 3D representations. In Chapter Three, I will introduce a methodology for natively performing convolutions on the spherical manifold, which is the underlying geometric representation for signals from a range of disciplines, from climate science to panoramic vision. Our methodology is efficient to compute, and naturally prevents the distortion related problems that arise from directly using CNNs on equirectangular projections of spherical signals. Finally, in Chapter Four, I will present a novel continuous implicit 3D representation for large scenes and large physical systems that can allow us to leverage localized learned geometric priors for 3D reconstruction tasks. Moreover, a simple extension to this representation allows us to inject Partial Differential Equation (PDE) constraints within physical systems, in order to facilitate a physics-informed and physics-abiding deep learning architecture.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View