Video-based object or face recognition services on mobile devices have recently garnered significant attention, given that video cameras are now ubiquitous in all mobile communication devices. In one of the most typical scenarios for such services, each mobile device captures and transmits video frames over wireless to a remote computing cluster (a.k.a. "cloud" computing infrastructure) that performs the heavy-duty video feature extraction and recognition tasks for a large number of mobile devices. A major challenge of such scenarios stems from the highly-varying contention levels in the wireless transmission, as well as the variation in the task-scheduling congestion in the cloud. In order for each device to adapt the transmission, feature extraction and search parameters and maximize its object or face recognition rate under such contention and congestion variability, we propose a systematic learning framework based on multi-user multi-armed bandits. The performance loss under two instantiations of the proposed framework is characterized by the derivation of upper bounds for the achievable short-term and long-term loss in the expected recognition rate per face recognition attempt against the "oracle" solution that assumes a-priori knowledge of the system performance under every possible setting. Unlike well-known reinforcement learning techniques, such as Q-learning, that exhibit very slow convergence when operating in highly-dynamic environments, the proposed bandit-based systematic learning quickly approaches the optimal transmission and cloud resource allocation policies based on feedback on the experienced dynamics (contention and congestion levels).
To validate our approach, time-constrained simulation results are presented via: (i) contention-based H.264/AVC video streaming over IEEE 802.11 WLANs and (ii) principal-component based face recognition algorithms running under varying congestion levels of a cloud-computing infrastructure. Against state-of- the-art reinforcement learning methods, and for the same recognition accuracy rate set for each face recognition transaction between a mobile and the cloud, our bandit-based framework is shown to provide 17.8% ~ 44.5% reduction of the number of video frames that must be processed by the cloud for recognition and 11.5% ~ 36.5% reduction in the video traffic over the WLAN.