Sequential Procedures for Nonparametric Statistical Process Control and Longitudinal Data Classification
- Author(s): Zhang, Xin
- Advisor(s): Li, Jun
- et al.
Sequential analysis could potentially reduce financial and human cost due to its capability of reaching an earlier conclusion. Since its introduction, sequential analysis has been widely applied to many areas such as statistical process control (SPC) and clinical design. However, nonparametric SPC cumulative sum (CUSUM) procedures for multivariate data and correlated observations are still rare in literatures, and there is little discussion in sequential classification for longitudinal data. In this dissertation we try to develop new sequential procedures for nonparametric statistical process control applicable to multivariate and serially correlated data, and sequential classifier for longitudinal data.
First, we develop two nonparametric multivariate CUSUM control charts based on spatial sign and data depth. These two procedures can be considered as the nonparametric counterparts of the two parametric multivariate CUSUM procedures developed in Crosier (1988). We show that the two proposed CUSUM procedures are affine-invariant and asymptotically distribution-free over a broad family of distributions. In our simulation studies, the proposed CUSUM procedures perform well across a broad range of settings, and compare favorably with existing CUSUM procedures for detecting location and scale changes.
Second, on the fundation of the above nonparameric multivariate CUSUM control charts, a nonparametric SPC procedure for correlated data is proposed. We incorporate wavelet decomposition with Box and Jenkins time series models and the above multivariate CUSUM control chart to obtain a procedure that is robust under correlated processes without distributional assumption. The procedure is also shown to be powerful in detecting location shift through extensive simulation studies.
Last, we develop a first of its kind sequential classification procedure for longitudinal data. The procedure adapts a neutral zone classifier framework, and attempts to reduce overall cost when the cost of time is considered. The sequential classifier evaluates each subject at each longitudinal time point for evidence of classification. A classification decision is not made until sufficient confidence is present or the last time point where the data can be collected is reached. The early decision property of the proposed classifier may aid the early diagnosis of severe disease diagnosis as in our real data example.