Wearable devices have been gaining popularity in biomedical studies and clinical trials. In recent years, wearable devices have become more common in the study design stage and for data collection purposes. Wearable devices, such as accelerometers and Fitbit, have not only made collecting data for participants much easier than before but also can capture subjects' activities along with other important biometrics more objectively than surveys and other traditional data-collecting methods. However, despite the potential benefit of using those technology-based trackers to collect data and potentially boost wearers' activity levels, very little is known about how individuals use these trackers on a daily basis or how tracker use relates to increasing physical activity or changing sedentary behaviors. Additional research is needed to understand how best to utilize trackers in interventions to support self-monitoring and effectively change behaviors. Furthermore, statistical methods for correcting estimates from activity measures that contained measurement error, and investigating causal inference between lifestyle interventions and activity level have not been fully exploited. There is a need for novel statistical approaches to answer the above questions in both randomized control trials and observational studies.
The goal of this dissertation is to develop appropriate and innovative statistical methods to answer the questions fore-mentioned, while trying to close the gap between available dense continuous mobile health data and appropriate statistical methods.
The dissertation consists of three main chapters. In chapter one, we used minute-level activity data collected from Fitbit trackers in a randomized controlled trial of breast cancer survivors to examine physical activity levels and adherence to Fitbit use. We examined patterns of activity level and Fitbit use for both the 12-week intervention period and the 2-year follow-up period and compared patterns between the intervention group and the control group. We found that within the first 3-month intervention period, the Exercise group has a higher average of MVPA and adherence to Fitbit use than the Wellness group, but the trend of MVPA and adherence to Fitbit use are no differences between the two groups. Besides that, both the Exercise and Wellness group showed a dropping trend of MVPA and adherence to Fitbit use in the follow-up period, but the Exercise group has a much slower dropping trend than the Wellness group.
Realizing the amount of measurement errors and extreme values contained in the activity data captured by those wearable devices in chapter one, and motivated by the existence of measurement errors in sedentary behavior assessment arising from different sources poses serious challenges for conducting statistical analysis and obtaining unbiased estimates, especially without validation data \cite{aim2_2}, in chapter two, we proposed to use structure models consisting of Linear Mixed Effect Models and Generalized Linear Models to obtain unbiased estimates of the relationship between exposures subject to measurement errors and outcome of interest, after appropriately accounting for the errors in devices' measurement. In the motivating example of chapter two, we found that without accounting for errors in the measurements, we may end up inappropriately exaggerating the effect of sedentary time on subjects' BMI and disseminating invalid health guidance to the population.
To investigate causal inference between lifestyle interventions and activity level while addressing the extreme values of the measurements from the wearable devices, in chapter three, we proposed a double robust estimator to extend the traditional Mann Whitney Wilcoxon Rank Sum Test (MWWRST) for causal inference in observational studies. The proposed estimator not only addresses the limitations of existing alternatives for more robust and reliable inference when applying the MWWRST to observational study data, but also performs well for small sample sizes. Meanwhile, The results from the real weight-loss trial showed that in addition to the doubly robust properties, the proposed estimator also effectively addressed outliers and extreme values.