## Content-Type Coding

- Author(s): Song, Linqi
- Advisor(s): Fragouli, Christina
- et al.

## Abstract

Traditionally, we use networks to securely and efficiently convey specific information messages to one or more receivers. However, communication networks today are increasingly used to serve a fundamentally different traffic, that delivers types of content rather than specific messages. For instance, when we want to find photos of an event, we may not care which specific photos we receive - we only care about the type of content, namely, that they are photos of the correct event. Content-type traffic pervades a host of applications today, e.g., search engines, recommender systems, and advertising networks. Research on content-type networks is very popular. Most of the existing work looks at how to classify content into types; what to replicate, where and how to store, and from where to retrieve specific data. In contrast, we investigate a totally different question: are there benefits in designing information transmission schemes specifically tailored to content-type traffic?

Our research indicates that in some cases, these benefits can be significant. We design a polynomial-time algorithm for pliable index coding that requires at most $O(\log^2(n))$ broadcast transmissions to serve $n$ clients, as compared to $O(n)$ broadcast transmissions for conventional index coding. This indicates that the exponential benefits of pliable index coding can be effectively realized. Moreover, we explore two applications: recommender systems and distributed computing. In recommender systems, we ask: how much we can gain in terms of bandwidth and user satisfaction, if recommender systems took into account not only the user preferences, but also the fact that they may need to serve these users under bandwidth constraints, as is the case over wireless networks. In other words, what if the recommender systems became bandwidth aware? In this setup, the user is satisfied to receive any message she does not already have, with a satisfaction proportional to her preference for that message. We show, through a number of scenaria, that although the optimization problems are in general NP-hard, polynomial time algorithms with constant approximation ratio can be designed to achieve more than 80\% of the satisfaction and to save 90\% of bandwidth. In distributed computing, to improve the communication efficiency in the data shuffling phase, we examine the pliable index coding problem under data shuffling constraints, where each of the $m$ messages can satisfy at most $c$ out of $n$ clients. We show that the constrained pliable index coding can achieve up to $O(n)$ (best case) benefits over index coding. We prove that the problem is NP-hard and the optimal broadcast transmissions for random instances is almost surely upper bounded by $O(\min\{\frac{n}{c\log(n)},\frac{n}{\log(m)}\})$ for $c=o(\frac{n^{1/7}}{\log^2(n)})$ and $O(\min\{\frac{n}{c}+\log(c),\frac{n}{\log(m)}\})$ for $c=\Omega(\frac{n^{1/7}}{\log^2(n)})$. Building upon constrained pliable index coding, we design a hierarchical data shuffling scheme for distributed computing. By leveraging the many possible shuffling choices, our proposed shuffling scheme is able to reduce the communication cost and achieve benefits up to $O(ns/m)$ over the index coding method, where $ns/m$ is the average number of workers caching a message, and $m$, $n$, and $s$ are the numbers of messages, workers, and cache size, respectively. In addition, we study the beneficial cases of content-type coding over large scale networks and over erasure channels. Compared with message-specific coding, we show that the benefits can be up to the number of messages in the content type for the former case and up to $19.5\%$ for the latter case with symmetric setting.