Statistical Innovations in Health and Data Security: Lung Cancer Diagnosis, Microbiome Community Detection, and Adversarial Attack Analysis
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Statistical Innovations in Health and Data Security: Lung Cancer Diagnosis, Microbiome Community Detection, and Adversarial Attack Analysis

Abstract

This dissertation aims to investigate three distinct problems. Firstly, it aims to enhance lung cancer diagnosis and survival predictions through the implementation of deep learning techniques and CT imaging. Secondly, it delves into understanding the differences in distortion patterns present in adversarial images generated by various attack methods. Lastly, it explores the application of the Minimum Description Length (MDL) principle for optimal threshold determination in microbiome community detection.

Supervised by Professor Thomas Lee and Professor James Sharpnack, Chapter \ref{ch:chap2} proposes the utilization of convolutional neural networks to model the intricate relationship between the risk of lung cancer and the morphology of the lungs depicted in CT images. Introducing a mini-batched loss extending the Cox proportional hazards model, this approach accommodates the non-convexity induced by neural networks, enabling training on large datasets. The combination of mini-batched loss and binary cross-entropy facilitates the prediction of both lung cancer occurrence and the risk of mortality. Results from simulations and real data experiments highlight the potential of this method to advance lung cancer diagnosis and treatment.

Supervised by Professor Thomas Lee, Chapter \ref{ch:chap4} discusses the application of the MDL principle in microbiome data analysis, particularly focusing on community detection methods. Addressing the challenge of subjective threshold selection in correlation-based techniques, MDL is employed to identify the optimal community structure by minimizing the subjectivity in choosing a cut-off for correlation strength. The chapter provides a detailed derivation of the MDL principle, discusses its consistency in threshold selection, and validates its effectiveness through simulations. A real data experiment involving microbiome data from the Great Lakes offers practical insights into the application of MDL in a real-world context.

Supervised by Professor Thomas Lee, Professor Yao Li, and Professor Cho-Jui Hsieh, Chapter \ref{ch:chap3} explores the vulnerability of deep neural networks to adversarial examples. Focusing on three common attack families – gradient-based, score-based, and decision-based – the research aims to recognize distinct types of adversarial examples. By identifying the information possessed by attackers, effective defense strategies can be developed. The study demonstrates that adversarial images from different attack families can be successfully identified with a simple model. Experiments on CIFAR10 and Tiny ImageNet reveal differences in distortion patterns between various attack types for both $L_2$ and $L_\infty$ norms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View