Skip to main content
Open Access Publications from the University of California

UC San Diego

UC San Diego Previously Published Works bannerUC San Diego

Identification and characterization of tweets related to the 2015 Indiana HIV outbreak: A retrospective infoveillance study



From late 2014 through 2015, Scott County, Indiana faced an HIV outbreak triggered by opioid abuse and transition to injection drug use. Investigating the origins, risk factors, and responses related to this outbreak is critical to inform future surveillance, interventions, and policymaking. In response, this retrospective infoveillance study identifies and characterizes user-generated messages related to opioid abuse, heroin injection drug use, and HIV status using natural language processing (NLP) among Twitter users in Indiana during the period of this HIV outbreak.

Materials and methods

Our study consisted of two phases: data collection and processing, and data analysis. We collected Indiana geolocated tweets from the public Twitter API using Amazon Web Services EC2 instances filtered for geocoded messages in the immediate pre and post period of the outbreak. In the data analysis phase we applied an unsupervised machine learning approach using NLP called the Biterm Topic Model (BTM) to identify tweets related to opioid, heroin/injection, and HIV behavior and then examined these messages for HIV risk-related topics that could be associated with the outbreak.


More than 10 million geocoded tweets occurring in Indiana during the immediate pre and post period of the outbreak were collected for analysis. Using BTM, we identified 1350 tweets thought to be relevant to the outbreak and then confirmed 358 tweets using human annotation. The most prevalent themes identified were tweets related to self-reported abuse of illicit and prescription drugs, opioid use disorder, self-reported HIV status, and public sentiment regarding the outbreak. Geospatial analysis found that these messages clustered in population dense areas outside of the outbreak, including Indianapolis and neighboring Clark County.


This infoveillance study characterized the social media conversations of communities in Indiana in the pre and post period of the 2015 HIV outbreak. Behavioral themes detected reflect discussion about risk factors related to HIV transmission stemming from opioid and heroin abuse for priority populations, and also help identify community attitudes that could have motivated or detracted the use of HIV prevention methods, along with helping identify factors that can impede access to prevention services.


Infoveillance approaches, such as the analysis conducted in this study, represent a possibly strategy to detect "signal" of the emergence of risk factors associated with an outbreak though may be limited in their scope and generalizability. Our results, in conjunction with other forms of public health surveillance, can leverage the growing ubiquity of social media platforms to better detect opioid-related HIV risk knowledge, attitudes and behavior, as well as inform future prevention efforts.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View