The Internet as a networked system has been rendered more complex than ever before as human endpoints are grafted into the system via increasingly pervasive and personalized networked devices. According to the United Nations, the Internet is a transnational enabler of a number of human rights, and as such, access to the Internet has been proclaimed to be a basic right unto itself. Unfortunately, even as networked devices have become ubiquitous, access to the Internet has not. In many cases, the reasons behind this digital divide involve contextual challenges such as limited infrastructure, limited economic viability, and rugged terrain. In this dissertation, we seek to ameliorate these challenges by designing data-driven, community-based network infrastructure.
In order to extend Internet connectivity to communities located in some of the most challenging contexts, we start by understanding how Internet connectivity is used when communities receive initial Internet access. We do this by partnering with two ISPs (Internet service providers) that brought initial Internet connectivity to two geographic regions in Indian Country. The data we have collected from these two ISPs totals to 115 TB generated over a combined three years of partnerships. Our ISP collaborators serve a total of 1,300 subscribers who represent residents of 14 different Native American reservations representing 18 different tribes. The service areas of these ISPs include predominantly rural communities located on mountainous and forested terrain. Key findings from our analysis of data generated by these ISPs include: the prevalence of social media and streaming content, the locality of interest with respect to social media content, and the similarity of Web browsing preferences between households and the aggregate communities to which they belong. We augment our analysis of network traces collected from ISPs with analysis of data collected from some of the most prevalent social media platforms. One of our studies mines Instagram trace data collected from Instagram servers to better understand the relationship between network infrastructure capacity and social media usage patterns. We found that only a small percentage of content available to users over social media platforms is actually interacted with by users and that only a small portion of available bandwidth is needed to support interaction with this content. Moreover, in our analysis of the diffusion of content disseminated by Native American advocates on Twitter, we found that the rate of diffusion and the prevalence of content is tied to its media richness, and that richer content does not guarantee rapid diffusion or longevity in the network. Based on the results our analyses as well as findings in related work, we design four community-based network technologies that address the network challenges associated with rural and developing contexts.
First, we introduce a social media content distribution system that operates over FM radio [200]. In order to provide content over a 1.2 Kbps technology (the Radio Broadcast Data System), we create a graph-based metric, the cumulative clustering coefficient, to filter content based on its total audience size and the diversity of its audience scope. We evaluate this delivery system used a trace-based simulation and we find that 81% of users received at least half of their content requests and 35.5% of the 1.1 million re- quested Instagram photos were transmitted to users. Next, we introduce FiDO [203], a community-based Web browsing agent and content delivery system that enables users from disconnected households to collect relevant content for themselves and members of their households opportunistically from content caches co-located with cellular base stations. We evaluate FiDO using a trace-driven simulation that combines Web traces
collected from one of our partner ISPs in addition to statistical models parameterized with census and transportation data. We find that an average of 80% of a household’s cacheable Web files can be delivered opportunistically and when crawling the Web on behalf of disconnected households, FiDO is able to provide an average of 69 Web pages to each household (where 73% of a household’s most browsed Web domains are repre- sented by the content collected on their behalf). We then describe some of the challenges associated with content creation and data collection in challenging contexts and intro- duce Open Development Kit (ODK) Submit and VillageShare for rural schools. ODK Submit is a smartphone-based platform that sits between data collection applications and the network interfaces of a devices [26]. It seeks to ease the burden of navigating heterogeneous network conditions for application developers, data collectors, and data processors. Principles from ODK Submit were incorporated into the publicly available ODK v. 2.0 tool suite as part of the Aggregate Tables Extensions suite [143]. In addition, we introduce VillageShare for rural schools, which enables schools in poorly-connected, rural areas to create and share culturally relevant curricula and empowers students to work collaboratively on “local cloud-based” projects despite their lack of access to net- work connectivity at home. We provide an evaluation of VillageShare that has been informed and parameterized by the deployment of Internet connectivity to rural schools over high-latency, low-bandwdith technology in South Africa.
We conclude with an overview of our key findings as well as a discussion of future research directions inspired by the work in this dissertation.