Computer vision is an interdisciplinary field to obtain high-level understanding from digital images. It has many applications that impact our daily lives, such as automation, entertainment, healthcare, etc. However, computer vision is very challenging. This is in part due to the intrinsically difficult nature of the problem and partly due to the complexity and size of visual data that need to be processed. To be able to deploy computer vision in many practical use cases, sophisticated algorithms and efficient implementations are required.
In this thesis, we consider two platforms that are suitable for computer vision processing, yet they were not easily accessible to algorithm designers and developers: the Web and FPGA-based accelerators. Through the development of open-source software components, we highlight challenges associated with vision development on each platform and demonstrate opportunities to mitigate them.
The Web is the world's most ubiquitous computing platform which hosts a plethora of visual content. Due to historical reasons such as insufficient compute performance and lack of API support for acquiring and manipulating images, computer vision is not mainstream on the Web. We show that in light of recent web developments such as vastly improved JavaScript performance and the addition of APIs such as WebRTC, efficient computer vision processing can be realized on web clients. Through novel engineering techniques, we translate a popular open-source computer vision library (OpenCV) from C++ to JavaScript and optimize its performance for the web environment. We demonstrate that hundreds of computer vision functions run in browsers with performance close to their original C++ version. We believe this will result in an immersive and perceptual web with transformational effects, including in online shopping, education, and entertainment, among others.
Field Programmable Gate Arrays (FPGA)s are a promising solution to mitigate the computational cost of vision algorithms through hardware pipelining and parallelism. However, an efficient FPGA implementation of computer vision algorithms requires hardware design expertise and a considerable amount of engineering person-hours. We show that graph-based specifications, such as OpenVX can significantly improve FPGA design productivity. Since such abstraction lacks implementation details, a vision algorithm designer can only focus on the algorithm itself and rely on another party with hardware knowledge to implement the design efficiently on a specific platform. During this process, different implementation configurations that satisfy various design constraints, such as performance and power consumption, can be explored. Furthermore, the graph-based model permits system-level optimizations that are not possible with traditional function-level acceleration. Towards this goal, we develop a framework that optimizes and implements vision algorithms that are described in OpenVX spec on different FPGA architectures. This framework hides low-level hardware optimization and implementation details from computer vision algorithm designers and enables them to quickly develop and verify FPGA implementations of vision algorithms without sacrificing performance.