Machine Learning for the Developing World using Mobile Communication Metadata
- Author(s): Khan, Muhammad R
- Advisor(s): Blumenstock, Joshua E
- et al.
Machine learning algorithms have started having an unprecedented impact on human society due to their improved accuracy. The ability to collect and analyze information at a large scale has enabled the researchers to develop novel algorithms that can beat the best of the human experts in quite a few cases. The size of the data and the quality of the data has been the primary factor behind the success of the machine learning algorithms. However, when it comes to the data related to human behavior a digital divide still exists. An indirect consequence of the popularity of the social networks and the proliferation of sensors in the developed world is that the researchers in industry and academia have been able to fine tune their findings and algorithms using these huge behavioral datasets providing accurate and deep insights about human behavior in the developed world. However, the same is not true about the developing world where until recently the surveys have been the primary way of collecting information about individuals in the society.
Social networks and digital sensors have not been that common in the developing world as compared to the developed world with one big exception, i.e., the mobile phones. More than 95\% of the world population today has mobile phone coverage and event in some of the most under-developed places of the earth the penetration of mobile phones is much higher as compared to other measures of human development like literacy or access to the financial infrastructure. As a result, researchers have been using the meta-data collected by the mobile phone companies in these developing countries as an alternative to the more conventional data sources. However, the raw mobile phone data may not be very well suited for the machine learning algorithms. In other words, there is a need for algorithms to convert the raw mobile communication meta-data into features suited for the machine learning algorithms. Developing novel ways to extract features from the mobile phone meta-data has been the central question of my research.
In this dissertation, I am going to describe my work on extracting features from mobile communication logs using techniques like Deterministic Finite Automata (DFA). I will also show that how this approach outperforms other methods for problems like product adoption and churn prediction. I further show that by using DFA based features and spectral analysis of the multi-view nature of mobile communication networks, advanced neural network training algorithms can be developed that beat the current state of the art methods for the problems like poverty prediction and gender prediction. Last part of this dissertation describes the value of communication networks data for research questions related to social networks analysis like what are the salient differences between the behavioral patterns of men and women in the developing world as exhibited in the communication networks data.