Multiple Testing and False Discovery Rate Control: Theory, Methods and Algorithms
Multiple testing, a situation where multiple hypothesis tests are performed simultaneously, is a core research topic in statistics that arises in almost every scientific field. When more hypotheses are tested, more errors are bound to occur. Controlling the false discovery rate (FDR) [BH95], which is the expected proportion of falsely rejected null hypotheses among all rejections, is an important challenge for making meaningful inferences. Throughout the dissertation, we analyze the asymptotic performance of several FDR-controlling procedures under different multiple testing settings. In Chapter 1, we study the famous Benjamini-Hochberg (BH) method [BH95] which often serves as benchmark among FDR-controlling procedures, and show that it is asymptotic optimal in a stylized setting. We then prove that a distribution-free FDR control method of Barber and Cand`es [FBC15], which only requires the (unknown) null distribution to be symmetric, can achieve the same asymptotic performance as the BH method, thus is also optimal. Chapter 2 proposes an interval-type procedure which identifies the longest interval with the estimated FDR under a given level and rejects the corresponding hypotheses with P-values lying inside the interval. Unlike the threshold approaches, this procedure scans over all intervals with the left point not necessary being zero. We show that this scan procedure provides strong control of the asymptotic false discovery rate. In addition, we investigate its asymptotic false non-discovery rate (FNR), deriving conditions under which it outperforms the BH procedure. In Chapter 3, we consider an online multiple testing problem where the hypotheses arrive sequentially in a stream, and investigate two procedures proposed by Javanmard and Montanari [JM15] which control FDR in an online manner. We quantify their asymptotic performance in the same location models as in Chapter 1 and compare their power with the (static) BH method. In Chapter 4, we propose a new class of powerful online testing procedures which incorporates the available contextual information, and prove that any rule in this class controls the online FDR under some standard assumptions. We also derive a practical algorithm that can make more empirical discoveries in an online fashion, compared to the state-of-the-art procedures.