Traditionally, clinical studies use a multitude of sensors and elaborate lab tests to construct datasets. These studies are typically expensive, tedious, and subjective to individuals in the study. The increasing prevalence of social media platforms and mobile, wearable, fitness trackers now provides a novel opportunity to gather health data. These new approaches are highly scalable, inexpensive, and can objectively estimate the health markers that identify disease outbreaks.
There are, however, a number of challenges in effectively using the above data sources: (i) distinguishing between an individual with an actual health condition versus a general health-related discussion or health product commercial, (ii) identify health-related markers without explicit keywords describing symptoms or diseases, (iii) maintaining individual privacy while sharing their personal health data. This thesis constructs general unsupervised algorithms to detect health-related events and specialized supervised algorithms to detect more subtle events from wearable sensor data.