In the past decade, research in machine learning has been principally focused on the development of algorithms and models with high predictive capabilities. Models such as convolutional neural networks (CNNs) achieve state-of-the-art predictive performance for many tasks in computer vision, autonomous driving, and transfer learning. However, interpreting these models remains a challenge, primarily because of the large number of parameters involved.
In this thesis, we investigate two regimes based on (1) compression and (2) stability to build more interpretable machine learning models. These regimes will be demonstrated in a computational neuroscience study. In the first part of the thesis, we introduce a greedy structural compression scheme that prunes filters in a trained CNN. To do this, we define a filter importance index equal to the classification accuracy reduction (CAR) of the network after pruning that filter (similarly defined as RAR for regression). CAR achieves state-of-the-art classification accuracy compared to other filter pruning schemes. Furthermore, we demonstrate the interpretability of CAR-compressed CNNs by showing that our algorithm prunes filters with visually redundant functionalities such as color filters.
In the second part of this thesis, we introduce DeepTune, a stability-driven visu- alization and interpretation framework for CNN-based models. DeepTune is used to characterize biological neurons in the V4 area of the primate visual cortex. V4 is a large retinotopically-organized area of the visual cortex located between the primary visual cor- tex and high-level areas in the inferior temporal lobe. V4 neurons have highly nonlinear response properties and it is notoriously di cult to construct quantitative models that accurately describe how visual information is represented in V4. To better understand the filtering properties of these neurons, we study recordings from 71 well isolated cells stimulated with 4000-12000 static grayscale natural images collected by the Gallant Lab at UC Berkeley. Our CNN-based models of V4 neurons achieve state-of-the-art accuracy in predicting neural spike rates in a hold-out validation set (average predictive correlation of 0.53 for 71 neurons). Then, we employ our DeepTune stability-driven interpretation framework and discover that the V4 neurons are tuned to a remarkable diversity of tex- tures (40% of the neurons), contour shapes (30% of the neurons), and complex patterns (30% of the neurons). Most importantly, these smooth DeepTune images provide testable naturalistic stimuli for future experiments on V4 neurons.
In the final part of this thesis, we study the application of CAR and RAR compressed CNNs in modeling V4 neurons. Both CAR and RAR compression give rise to a new set of simpler models for V4 neurons with similar accuracy to existing state-of-the-art models. For each of the accurate CAR and RAR compressed models of V4 neurons (up to 90% compression rate), the smooth DeepTune images are stable and exhibit similar patterns to the uncompressed model’s consensus DeepTune image. Our results suggest, to some extent, that these CNNs resemble the structure of the primate brain.