$p$-Hacking can undermine the validity of empirical research. The central focus of this dissertation is on analyzing existing and developing new statistical methods for detecting $p$-hacking based on the empirical distribution of reported results across studies.
In Chapter 1 we theoretically analyze the problem of testing for $p$-hacking based on distributions of $p$-values across multiple studies. We provide general results for when such distributions have testable restrictions (are non-increasing) under the null of no $p$-hacking. We find novel additional testable shape restrictions for $p$-values based on $t$-tests. These testable restrictions result in more powerful tests for the null hypothesis of no $p$-hacking. When there is also publication bias, our tests are joint tests for $p$-hacking and publication bias. A reanalysis of two prominent datasets shows the usefulness of our new tests.
Chapter 2 provides a careful understanding of the power of methods used to detect different types of $p$-hacking discussed in Chapter 1. We theoretically study the implications of likely forms of $p$-hacking on the distribution of reported $p$-values and the power of existing methods for detecting it. Power can be quite low, depending crucially on the particular $p$-hacking strategy and the distribution of actual effects tested by the studies. We relate the power of the tests to the costs of $p$-hacking and show that power tends to be larger when $p$-hacking is very costly.
Chapter 3 studies Caliper tests that are widely used to test for the presence of $p$-hacking and publication bias based on the distribution of the $z$-statistics across studies. We show that without additional restrictions on the distribution of true effects, Caliper tests may suffer from substantial size distortions. We propose a modification of the existing Caliper test, referred to as the Robust Caliper test, which is shown to control size irrespective of the true effect distribution. We also propose a way of correcting the regression-based version of the Caliper test that allows for the inclusion of additional covariates.