The growing availability of data and rapid advancements in machine learning are revolutionizing decision-making. Often, these data come from distributed devices with low computational capabilities, connected to a central learner through communication constrained channels. As a result, developing learning algorithms with low communication and computational demands is increasingly attracting attention in the literature and in practice. In this thesis, we solve open problems within this framework for three popular learning setups: Contextual Bandits, Multi-Modal Representation Learning, and Classification.Contextual Bandits is an online learning problem where an agent makes decisions based on contextual information and receives feedback (rewards) for these decisions. The agent learns from past interactions to optimize future rewards. This problem models various practical applications such as recommendation systems, clinical trials, and resource allocations. This thesis offers the following contributions to this area. First, we propose a method that uses only 3 bits to send each reward, even if the rewards take values from an infinite set, while achieving (nearly) optimal learning performance. Second, we prove a surprising reduction from contextual to non-contextual bandits (where only a single context is available for all users). This allows to solve contextual bandits without the need to communicate the context, which can be communication heavy in many practical setups. The reduction provides a framework for developing efficient contextual bandit algorithms by using the simpler algorithms proposed for non-contextual bandits, leading to improved performance bounds in a number of setups such as contextual bandits with sparse unknown parameters, misspecification, privacy constraints, adversarial corruption, and many others. The reduction takes a step into solving multiple open problems in these setups. Lastly, we introduce the first computationally efficient algorithm for contextual bandits with limited policy switches, which is highly relevant to most practical scenarios. In most setups, the learner cannot frequently update the policy due to communication or computation limits, or because of high response rates, as in the case of clinical trials or recommendation systems.
In the second setup, Multi-Modal Representation Learning, the focus is in extracting common and private features from multi-modal data, which is required for various applications such as vehicle tracking and medical diagnosis from multi-modal sources, e.g., images, and audio. This thesis introduces the notion of Common Information Dimension (CID), which quantifies continuous common information between sources in terms of number of dimensions. Unlike existing bit-based measures that cannot deal with continuous common information, CID provides a fundamental limit on the dimension of common latent variables in multi-modal learning applications. We develop methods to compute CID for distributions of interest to practical scenarios.
Lastly, the Classification setup aims to compress features that are collected at distributed nodes and sent to a central entity for classification. This scenario is encountered by many machine learning applications including wireless cyberphysical systems, immersive environments and supported health. This thesis proposes a compression scheme tailored to classification tasks that allows the central entity to use an existing classifier that operates transparently to the feature compression and maintains the (uncompressed) performance. Although we prove the NP-hardness of finding the optimal compression scheme, we introduce a computationally efficient compression algorithms to approximately solve the problem.
We experiment our schemes in practical setups showing significant savings in both the communication and computation costs while achieving state-of-the-art performance.