Online Learning Over Networks
- Author(s): Towfic, Zaid Joseph
- Advisor(s): Sayed, Ali H
- et al.
Distributed convex optimization refers to the task of minimizing the aggregate sum of convex risk functions, each available at an agent of a connected network, subject to convex constraints that are also distributed across the agents. The risk functions at the agents are generally defined as expectations of certain loss functions, with the expectations computed over the statistical distribution of the data. One of the key challenges in solving such optimization problems is that the agents are only subjected to streaming data realizations and they are not aware of the underlying statistical models. Accordingly, each agent is only aware of its loss function and does not have sufficient information to evaluate its risk function. Another key challenge is that each agent is only aware of its constraints. This dissertation develops and studies fully decentralized first-order strategies to solve such problems.
The first class of strategies considered in this work belongs to the family of primal-dual methods, which rely on propagating two sets of variables: the primal variable and a dual variable. In contrast to the traditional treatment of these algorithms in the optimization literature, this dissertation develops primal-dual methods for adaptation and learning from streaming data over networks. In this case, the optimization problem is not necessarily static since its minimizer can drift with time and the exact form of the cost function is not known beforehand. Fully-distributed and adaptive variants of the Arrow-Hurwicz (AH) and augmented Lagrangian (AL) methods are developed by resorting to stochastic-gradient approximations. The stability range and the performance limits of the resulting strategies are analyzed. Several revealing results are discovered including the fact that primal-dual techniques tend to have a smaller stability range and a degraded performance in comparison to primal techniques, such as consensus and diffusion strategies. This conclusion suggests that AH and AL techniques are not as reliable for adaptation over networks and the advantages of AH and AL techniques in the static optimization context do not necessarily carry over to the stochastic context.
The dissertation subsequently focuses on developing primal optimization methods of the diffusion type for the distributed solution of a general class of constrained optimization problems. Most available distributed solutions for these types of problems rely on the use of projection steps in order to ensure that the successive iterates at the nodes satisfy the convex constraints. In these solutions, the constraint conditions need to be relatively simple in order for the distributed algorithm to be able to compute the necessary projections analytically (such as projecting onto the nonnegative orthant). This work proposes a new class of distributed solutions that employ constant step-sizes and that eliminate the need for projection steps. The solution technique relies instead on the use of suitably chosen penalty functions and replaces the projection step by a stochastic approximation update that run concurrently with the optimization step. The challenge is to show that the use of penalty functions in the stochastic gradient update step still leads to accurate solutions. The analysis in the dissertation establishes that this is indeed possible. Moreover, in the proposed solution, the nodes are only required to interact locally and to have access to local iterates from their neighbors; there is no need for the nodes to know any of the constraints besides their own.