Approaches to Interpret Deep Neural Networks
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Merced

UC Merced Electronic Theses and Dissertations bannerUC Merced

Approaches to Interpret Deep Neural Networks

Creative Commons 'BY' version 4.0 license
Abstract

Practical deployment of deep neural networks has become widespread in the last decade due to their ability to provide simple, intelligent, and automated processing of the tasks that up to now were hard for other machine learning models. There is an enormous financial and societal interest in deep neural networks as a viable solution for many practical problems such as computer vision, language processing, financial fraud detection, and many more. At the same time, some concerns regarding the safety and ethical use of these models have arisen as well. One of the main concerns is interpretability, i.e., explaining how the model makes a decision for an input. Interpretability is one of the most important problems to address for building trust and accountability as the adoption of deep neural networks has increased significantly in sensitive areas like medicine, security, and finance.This dissertation proposes two novel approaches to interpreting deep neural networks. The first approach focuses on understanding what information is retained by the neurons of a deep net. We propose an approach to characterize the region of input space that excites a given neuron to a certain level. Inspection of these regions by a human can reveal regularities that help to understand the neuron. In the second approach, we provide a systematic way to understand what group of neurons in a deep net are responsible for a particular class. This allows us to study the relation between deep net features (neuron’s activation) and output classes; and how different classes are distributed in the latent space. We also show that out of thousands of neurons in the deep net, only a small subset of neurons is associated with a specific class. Finally, we demonstrate that the latter approach can also be used to interpret large datasets. This is achieved by applying the second approach directly over the input features. This allows us to understand what input features are related to a specific class and what set of features differentiates between a group of classes or even sub-groups within a given class.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View