 Main
StructureInformed Neural Network Architecture in Regression Applications
 Zhang, Jiefu
 Advisor(s): Lin, Lin
Abstract
This dissertation concerns the importance of the structureinformed neural network architecture in several regression applications. The word structureinformed neural network architecture refers to the neural network architecture that has some builtin properties motivated by the structure of the problem. For example, if the target function that the neural network is approximating has the permutation symmetry, instead of implementing a basic feedforward neural network to explore the entire function space, we can build a permutation symmetric neural network architecture to only explore a much smaller space of permutation invariant functions. For supervised learning, there are multiple methods to achieve better numerical results when training a neural network for a regression application problem, including but not limited to: improved optimization techniques, increased training sample size, and superior neural network architecture. In this dissertation we investigate and observe the importance of the superior neural network architecture which incorporates the information hidden in the application problem, and then showcase a quantum chemistry application where such superior neural network architecture that incorporates various kinds of symmetries achieves powerful numerical results. Finally we discuss how one kind of structure, permutation symmetry, can be built into the neural network.
After the introduction of the fundamentals in chapter 1, in chapter 2 we investigate the importance of the structureinformed neural network architecture on a toy problem: use a neural network to approximate the mapping $\mathbf{x} \mapsto \sum_{i=1}^n x_i^2$. We observe that the role the structureinformed neural network architecture plays in this scenario is irreplaceable. For instance, it will take significantly more training data samples for a neural network architecture that does not take the problem structure into consideration to match the performance of a neural network that does. The training tricks and heuristics, including using various optimizer or applying regularization, cannot easily close the gap of the performance either. In chapter 3 we look at one particular real world quantum chemistry application and observe how the superior neural network architecture leads to the positive numerical results. The application problem is to use a neural network to predict the electron density in an electronic system, with the input being atomic configuration (e.g. positions of some water molecules). There are several symmetries hidden in the problem. For example, interchanging two identical atoms should not affect the electron density in the system. By building the translation symmetry, rotation symmetry, and permutation symmetry into the neural network architecture, we are able to predict the electron density accurately for multiple 1D and 3D systems. In chapter 4 we focus on a specific type of structure that can be built into the neural network architecture, which is permutation symmetry. We summarize the existing approaches of incorporating the permutation invariant and equivariant symmetry into the neural network architecture and offer proofs of the validity of the approximation ansatz tailored for permutation symmetry.
Main Content
Enter the password to open this PDF file:













