- Main
Residual Neural Network on a Many-Core Platform
- Wu, Haotian
- Advisor(s): Baas, Bevan M
Abstract
Deep neural networks are used in many applications such as image classification, image recognition, natural language processing, etc. For real-time applications, lower latency(i.e., the time between when the input arrives and output is generated) is crucial. ResNet(Residual neural networks) is one of the most widely used deep neural networks in recent years. The most important reason for its prevalence is that residual structure can gain accuracy from considerably increased depth. Residual functions with reference to the layer inputs instead of learning unreferenced functions make this feature stand out. This thesis proposes a many-core implementation of ResNet-34 that is complete except for the softmax layer and offers low latency and high throughput performance, i.e., more images classified per second. Details of residual neural network architecture, algorithms of kernels, and mapping methodology are also presented.
The many-core implementation is compared against several general-purpose processors, GPUs, and FPGAs implementation of ResNet. The key metrics by which these platforms are compared are throughput per area, throughput per watt, and energy-delay product(EDP)gathered during the inference of 1 image. Since different fabrication technologies are used, throughput, area, and energy dissipation for all platforms are scaled to 32 nm. The many-core implementation offers 10.75×– 67.89×improvement over general-purpose processors, and6.6×– 8.6×over GPUs in throughput per area. Meanwhile, the many-core implementation provides 7.06×– 18.71×improvement in throughput per watt over general-purpose processors, 2.67×– 3.65×over FPGAs, and 1.32×– 4.98×over GPUs. Also, the proposed implementation has the lowest EDP among all platforms, which offers 2,329×– 2,529×improvement over CPUs, 46×– 579×over GPUs and FPGAs.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-