Applications and Implications of Big Data for Demo-Economic Analysis: The Case of Call-Detail Records
- Author(s): Letouzé, Emmanuel Francis
- Advisor(s): Lee, Ronald D
- et al.
This dissertation analyzes and discusses various applications and implications of Big Data for demo-economic analysis, focusing on the analysis of cell-phone data collected by telecommunication operators for billing purposes, commonly referred to as call-detail records, or CDRs, which include the time and duration of calls, the location of the emitter and receiver, etc. This is done by placing the resulting opportunities and challenges within the broader context of the 'Data Revolution', presented in Chapter 1.
In this context, "applications" refer to ways in which CDR analytics can be used for research and policymaking purposes by leveraging the information contained in these data on human behaviors, for example to predict criminality (Chapter 2), study weather and mobility patterns (Chapter 3), and estimate income levels (Chapter 4) and population density (Chapter 5). "Implications" refer to ways in which CDR analytics has and can be expected to affect and be affected by ethical, political, legal, and institutional factors and processes (Chapter 6).
At the heart of Big Data are new kinds of passively emitted digital data or `crumbs' that are the by-product of the fast growing and already near-ubiquitous use of digital devices and services by humans around the globe. These 'crumbs' leave digital traces of most of their actions that are collected and can be analyzed through powerful methods and machines by new types of stakeholders, including multi-disciplinary teams.
Chapter 1 analyzes the advent of the Big Data phenomenon over the past decade, with particular attention to its observed and possible effects on social science research and policymaking. In addition to providing an historical overview, it proposes taxonomies and concepts to clarify the nature and significance of the change brought about by Big Data. An important point of the chapter is the distinction it introduces between big data as new kinds of large datasets and Big Data as an ecosystem of 3Cs of Big Data: its crumbs, its capacities, and its communities. The chapter ends by questioning whether these kinds of digital data may replace traditional data and whether Big Data may render the scientific method obsolete, answering by the negative but arguing that social science research will dramatically evolve in contact with Big Data, while in turn shaping Big Data.
The subsequent four empirical chapters focus on different applications of CDR analytics to social problems:
Chapter 2 uses CDRs from Telefónica in London in conjunction with other socio-demographic data including police records to attempt to predict future crime hotspots, and presents a model with a predictive power of close to 70\%. This chapter offers an example of one of the major functions of Big Data introduced in Chapter 1, its predictive function here understood as forecasting, alongside its descriptive, prescriptive, and discursive functions. It also provides an opportunity to discuss some key tools and concepts commonly used in machine-learning as well as merits and limits of these approaches to crime prediction for public policy.
Chapter 3 uses CDRs from Orange in Côte d’Ivoire made available as part of the 2013 Data for Development challenge---a modality that has been the hallmark of the field and contributed to developing Big Data communities over time---alongside meteorological data, with the goal of estimating whether weather could impact human mobility in ways that may violate the exclusion restriction in research using rainfalls as an instrument from economic conditions to assess the causal link between economic conditions and conflict. It presents a statistically significant relationship, suggesting that weather could affect conflict through other channels than economic conditions and casting doubt on the use of rainfalls as an instrument for economic conditions in these settings.
Chapter 4 uses the same dataset and attempts to predict, here in the sense of inferring or now-casting, the multi-poverty index based on DHS data to assess whether and how these kinds of data available at high levels of temporal and geographical granularities may help some of the data gaps that characterize and may impede the development of some of the poorest countries in the world, showing promising results.
Chapter 5 uses similar data as in Côte d'Ivoire but for Senegal, in conjunction with census data, to address the central issue of sample bias in big data by correlating estimated population size through cell-phone activity and census data. It proposes a novel approach to estimating biases in the data as a function of key demographic variables including age at different geographic levels.
Chapter 6 finally focuses on political economy implications of Big Data as an ecosystem and socio-technological phenomenon, with a focus on its prospects and requirements, including institutional, legal, ethical, and political. It shows that Big Data in general and CDR analytics in particular raise complex and contentious questions for social science research, policymaking, and societies at large---including around power dynamics, informed consent, fairness, and civic participation etc., which will require significant investments in developing adequate responses, including to human awareness and capacities.
It also argues, as does the overall conclusion, that Big Data and open algorithms notably can provide an entry and anchor point to challenge and improve the current state of the world by giving data emitters---citizens---greater control over the use of the data they generate in ways that could revive democratic ideals and principles and make it a potentially truly revolutionary force.