Efficiently Designing Efficient Deep Neural Networks
A number of competing concerns slow adoption of deep learning for computer vision on“edge” devices. Edge devices provide only limited resources for on-device algorithms to employ, constraining power, memory, and storage usage. Examples include mobile phones, autonomous vehicles, and virtual reality headsets, which demand both high accuracy and low latency, two objectives competing for resources.
To tackle this sisyphean task, modern methods expend gargantuan amounts of computationto design solutions, exceeding thousands of GPU hours or years of GPU compute to design a single neural network. Not to mention, these works maximize just one performance metric – accuracy – under a single set of resource constraints. What if the set of resource constraints changes? If additional performance metrics rise to the forefront, such as explainability or generalization? Modern methods for designing efficient neural networks are handicapped by excessive computation requirements for goals too singularly and narrowly sighted.
This thesis tackles the bottlenecks of modern methods directly, achieving state-of-the-artperformance by efficiently designing efficient deep neural networks. These improvements don’t only reduce computation or only improve accuracy; instead, our methods improve performance and reduce computational requirements, despite increasing search space size by orders of magnitude. We also demonstrate missed opportunities with performance metrics beyond accuracy, redesigning the task so that accuracy, explainability, and generalization improve jointly, an impossibility by conventional wisdom, which suggests explainability and accuracy participate in a zero-sum game.
This thesis culminates in a set of models that set new flexibility and performance standards forproduction-ready models: those that are state-of-the-art accurate, explainable, generalizable, and configurable for any set of resource constraints in just CPU minutes.