Recent success in machine learning for various applications such as image classification and language generation via deep learning has encouraged similar development in other domains. In particular with the ubiquity of health sensors, troves of data are being collected from which useful information can be extracted to improve overall quality of life. For example, electronic health records (EHR) have become widespread across hospitals where many data modalities are collected during patient care. Although these datasets are befitting to the supervised learning framework, often due to limited annotated data, extensive missingness, and the temporal nature of the data, supervised models often generalize poorly. We address this by introducing unsupervised and semi-supervised methods that leverage unlabeled raw patient data to help downstream task generalization. To demonstrate the effectiveness of our proposed methods, and show their utility on real-world public medical datasets, including cohorts from hospitals, intensive care units, and wards. The set of methods introduced are simple and effective at improving downstream performance.