UC San Diego
A Data Science Approach for Real-Time HIV-Risk Analysis on Twitter
- Author(s): Vysyaraju, Amarnath Raju
- Advisor(s): Weibel, Nadir
- et al.
HIV is still a major epidemic and although significant progress in treatment has been achieved, a functional cure for HIV is still far away, and a great deal of effort is currently focused on the prevention of HIV. Prevalence of HIV has recently prompted clinicians and public health officials to take a look at social media as the source of digital epidemiology. This thesis introduces our data science approach aimed at capturing HIV-related trends based on multidimensional data from Twitter. We show how our platform can help clinicians understand
people’s risk behavior, and ultimately guide in HIV prevention. Our design is flexible and extensible, and currently employs a collection of techniques that span crowd-sourcing, natural language processing, image classification, supervised machine learning, and graph data analysis to classify at-risk tweets and user groups. In our experiments, we have established the relationship between an individual user’s risk along with the network’s risk for HIV based on their actions on Twitter. This infrastructure will serve as a foundation for building visualizations and real-time analytical tools for studying the prevalence of HIV-risk to better inform prevention resources.