Robust Estimation of Multivariate Location and Shape
In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27-42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69-95; J. A mer. Statist. Assoc. 89 (1994), 888-896) but the overall structure is presented here for the first time. After describing the firstlevel architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) -these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.