 Open Access Publications from the University of California

## Modern Statistical Inference for Classical Statistical Problems

Abstract

This dissertation addresses three classical statistics inference problems with novel ideas and techniques driven by modern statistics. My purpose is to highlight the fact that even the most fundamental problems in statistics are not fully understood and the unexplored parts may be handled by advances in modern statistics. Pouring new wine into old bottles may generate new perspectives and methodologies for more complicated problems. On the other hand, re-investigating classical problems help us understand the historical development of statistics and pick up the scattered pearls forgotten over the course of history.

Chapter 2 discusses my work supervised by Professor Noureddine El Karoui and Professor Peter J. Bickel on regression M-estimates in moderate dimensions. In this work, we investigate the asymptotic distributions of coordinates of regression M-estimates in the moderate $p/n$ regime, where the number of covariates $p$ grows proportionally with the sample size $n$. Under appropriate regularity conditions, we establish the coordinate-wise asymptotic normality of regression M-estimates assuming a fixed-design matrix. Our proof is based on the second-order Poincar {e} inequality (Chatterjee 2009) and leave-one-out analysis (El Karoui et al. 2011). Some relevant examples are indicated to show that our regularity conditions are satisfied by a broad class of design matrices. We also show a counterexample, namely the ANOVA-type design, to emphasize that the technical assumptions are not just artifacts of the proof. Finally, the numerical experiments confirm and complement our theoretical results.

Chapter 3 discusses my joint work with Professor Peter J. Bickel on exact inference for linear models. We propose the cyclic permutation test (CPT) for testing general linear hypotheses for linear models. This test is non-randomized and valid in finite samples with exact type-I error $\alpha$ for arbitrary fixed design matrix and arbitrary exchangeable errors, whenever $1 / \alpha$ is an integer and $n / p \ge 1 / \alpha - 1$. The test applies the marginal rank test on $1 / \alpha$ linear statistics of the outcome vectors where the coefficient vectors are determined by solving a linear system such that the joint distribution of the linear statistics is invariant to a non-standard cyclic permutation group under the null hypothesis. The power can be further enhanced by solving a secondary non-linear travelling salesman problem, for which the genetic algorithm can find a reasonably good solution. We show that CPT has comparable power with existing tests through extensive simulation studies. When testing for a single contrast of coefficients, an exact confidence interval can be obtained by inverting the test. Furthermore, we provide a selective yet extensive literature review of the century-long efforts on this problem, highlighting the novelty of our test.

Chapter 4 discusses my joint work with Professor Peng Ding on regression adjustment for Neyman-Rubin models. Extending R. A. Fisher and D. A. Freedman's results on the analysis of covariance, Lin (2013) proposed an ordinary least squares adjusted estimator of the average treatment effect in completely randomized experiments. We further study its statistical properties under the potential outcomes model in the asymptotic regimes allowing for a diverging number of covariates. We show that when $p >\!\!> n^{1/2}$, the estimator may have a non-negligible bias and propose a bias-corrected estimator that is asymptotically normal in the regime $p = o(n^{2/3} / (\log n)^{1/3})$. Similar to Lin (2013), our results hold for non-random potential outcomes and covariates without any model specification. Our analysis requires novel analytic tools for sampling without replacement, which complement and potentially enrich the theory in other areas such as survey sampling, matrix sketching, and transductive learning.