k-Step Bootstrap Bias Correction for Fixed Effects Estimators in Nonlinear Panel Models

Fixed effects estimators in nonlinear panel models with fixed T usually suffer from inconsistency because of the incidental parameters problem first noted by Neyman and Scott (1948). Moreover, even though T grows at the same rate as n, they are asymptotically biased and therefore the associated confidence interval has a large coverage error. This paper proposes a k-step parametric bootstrap bias corrected estimator. We prove that our estimator is asymptotically normal and is centered at the true parameter if T grows faster than ∛n. In addition to bias correction, we construct a confidence interval with a double bootstrap procedure, and Monte Carlo experiments confirm that the error in coverage probability of our CI’s is smaller than those of the alternatives. We also propose bias correction for average marginal effects.


Introduction
Panel data consists of repeated observations from di¤erent individuals across time. One virtue of this data structure is that we can control for unobserved time-invariant individual heterogeneity in an econometric model. When individual e¤ects are correlated with explanatory variables, we may use the …xed e¤ects estimator which treats each unobserved individual e¤ect as a parameter to be estimated. However, this approach usually su¤ers from inconsistency when the time series sample size (T ) is short. This is known as the incidental parameters problem, …rst noted by Neyman and Scott (1948). Furthermore, even though T grows at the same rate as n, the …xed e¤ects estimators are asymptotically biased so that the inference drawn from them may give misleading results.
This paper proposes a k-step parametric bootstrap bias corrected maximum likelihood (ML) estimator of nonlinear static panel models. In the k-step bootstrap procedure, we approximate the standard bootstrap estimator by taking k-steps of a Newton-Raphson (NR) iterative scheme. We employ the original estimate as the starting point for the NR steps. We estimate the asymptotic bias using the k-step bootstrap method and subtract this from the original biased estimator. We prove that the standard and k-step bootstrap bias corrected estimators are asymptotically normal and centered at the true parameter if T grows faster than 3 p n. This condition is important in practice because many economic data sets nowadays are composed of small T and large n and therefore the usefulness of the bias corrected estimation particularly depends on how much of the bias is corrected in small T . Our Monte Carlo experiments show that in …nite samples, the k-step bootstrap bias corrected estimators reduce the bias remarkably even for small T . This bias correction does not increase the asymptotic variance and thus bias correction substantially improves statistical inference. In addition to bias correcting the parameter estimators, we also apply the k-step bootstrap bias correction to the average marginal e¤ect estimation.
The substantial advantage of our approach over alternatives is that our method enables us not only to correct the asymptotic bias but also to improve the coverage accuracy of the associated con…dence intervals (CI). We construct the CI's using a double k-step bootstrap procedure. In Monte Carlo experiments we …nd that in …nite samples the error in coverage probability of our CI's is smaller than those of the other standard alternatives especially when T is small. This is true for the estimators of the model parameters as well as the estimators of the average marginal e¤ects.
Another clear advantage is ease of computation. Standard bootstrap methods in nonlinear models are usually very time-intensive because it is required to solve R nonlinear optimization problems to obtain R bootstrap estimates. R usually needs to be fairly large for the bootstrap method to be reliable. Unless the optimization problem is simple, this would be a very time-intensive task. Particularly, as the …xed e¤ects approach treats the individual e¤ects as parameters, there are many parameters to be estimated and computational intensiveness can be particularly serious in this type of models. For example, in our empirical application (not reported here), there are 1461 individuals, which means there are more than 1461 parameters to be estimated. In addition, the double bootstrap procedure which is used for constructing CI's in this method also increases computational intensiveness substantially. In order to overcome this problem, we introduce the k-step bootstrap estimation which only involves computing the Hessian and the score functions. We show that when n ! 1 the stochastic di¤erence between the standard and k-step bootstrap estimators is O p (T 2 k 1 ). When k 2; this di¤erence is of smaller order than the bias where is (L 1) vector of parameters of interest and i is a scalar individual heterogeneity and f is a probability density function with parameters and i : For any given parameter value, fz it g are independently distributed across i = 1; 2; :::; N and t = 1; 2; :::; T: The model includes discrete choice models and censored and truncation models as special cases.
Denote the true values of and ( 1 ; :::; i ; :::; N ) by 0 and 0 = ( 10 ; :::; i0 ; :::; N 0 ) respectively and let l( ; i ; z it ) log f (z it ; ; i ). The objective function for the …xed effects estimator,^ nT , is the concentrated log-likelihood function based on the preliminary estimator^ i . That is, we obtain^ nT by solvinĝ , usually T 6 = 0 . If the likelihood function is smooth enough, we can show by stochastic expansion that for some B. This implies that T ! 0 as T ! 1. However, it is still asymptotically biased if T grows at the same rate as n. That is, as n; T ! 1 and n=T ! , for some variance matrix : p nT (^ nT T ) converges to a normal distribution centered at zero, since T is the probability limit of^ nT . However, the second term, p nT B=T , does not vanish but converges to B p . Hence, statistical inference drawn from this will result in misleading conclusions even when T is as large as n. HN establish the analytic form of the leading bias of^ nT using stochastic expansion. For notational convenience, we de…ne and let additional subscripts denote partial derivatives, i.e. v it ( ; i ) @ 2 @ 2 i l( ; i ; z it ). We suppress the arguments of the functions such as u it , when they are evaluated at the true value ( 0 ; i0 ). HN show that in equation (5) where

Bootstrap Bias Correction
In this section, we provide the bias corrected estimator using a parametric bootstrap procedure. The parametric bootstrap is di¤erent from nonparametric bootstrap in that the former utilizes the parametric structure of the DGP by replacing the original parameters with their estimators to generate bootstrap samples, while the latter generates them from the empirical distribution function. Let F F^ nT ;^ denote the distribution function of bootstrap samples. We obtain F from F by replacing 0 and 0 , with^ nT and^ ^ (^ nT ). Therefore, in the bootstrap world,^ nT and^ are the true parameters. Let fz it g denote the bootstrap sample drawn at random from F . Based on fz it g, we can obtain the bootstrap estimators,^ nT and^ i by ML estimation. That is,^ and as before the maximization is taken over a compact set. The intuition behind the bootstrap bias correction is that the bias of a bootstrap estimator is a good approximation to that of a true parameter estimator. Under some regularity conditions, as n ! 1 and T ! 1; and E is the expectation operator with respect to F . Therefore, the bootstrap bias corrected estimator can be de…ned as The above estimator reduces the order of the magnitude of a bias from O p T 1 to O p T 2 . To show this, we employ the same de…nitions of HN in the bootstrap world, i.e. Then, and The conditional distribution of the bootstrap sample given the data or (^ nT ;^ (^ nT )) is the same as the distribution of the original sample except that the former uses (^ nT ;^ (^ nT )) rather than ( 0 ; 0 ) as true parameters 2 . Therefore, we have By stochastic expansion, we show that in Appendix I that Combining the above two equations with the de…nition of B ; we obtain As a …nal step, we show that under the assumptions given in Section 5 p nT P lim From (10), (11) and (12) This implies that the bootstrap bias corrected estimator removes the dominant bias and is asymptotically unbiased.

k-step Bootstrap Bias Correction
In this section, we de…ne the k-step bootstrap bias corrected estimator and demonstrate its higher order equivalence to the standard bootstrap estimator. The k-step procedure approximates^ nT by the NR iterative procedure. Let^ nT;k and i;k denote the k-step bootstrap estimator. We de…ne^ nT;k and^ k recursively in the following way: ^ where 3 and the start-up estimator^ nT;0 =^ nT ;^ 0 =^ : 3 The Hessian matrix we used is called the observed Hessian. We note that some terms in @ 2 log l ( ; i; z it ) =@ ( 0 ; 0 ) @ ( 0 ; 0 ) 0 have zero expectation. Dropping these terms in equation (15), we obtain the expected Hessian. Our asymptotic results remain valid for the expected Hessian, as the dropped terms are of smaller order.
In the appendix of proofs, we show that P lim n!1^ nT;k = P lim n!1^ nT + O p 1 T 2 k 1 : This implies the quadratic convergence of^ nT;k to^ nT as k increases. In particular, when k 2; P lim n!1^ nT;k = P lim n!1^ nT + O p 1=T 2 : So in large samples, the approximation error in using the k-step bootstrap instead of the standard bootstrap is of smaller order than the bias term that we intend to remove. Therefore, condition k 2 is necessary for the k-step bootstrap to achieve e¤ective bias reduction.
To implement the k-step bootstrap, we have to invert the Hessian matrix. Depending on the observations we sample, H j 1 may be close to be singular in practice, in which casê nT;j goes to in…nity. As a result, the mean of^ nT;k may not be …nite. To circumvent the undue in ‡uence of the second derivative of the objective function on our estimator, we introduce the truncated version, nT;k . The truncated estimator is de…ned as nT;k yields the same value as^ nT;k when the di¤erence of^ nT;k from^ nT is bounded by M nT = p nT , but does not blow up when it has an in…nite value. Similarly, we de…ne We can set M nT large enough that this truncation does not a¤ect the asymptotic properties. We show that when M nT ! 1 such that p n=T = o(M nT ); we have: As a …nal step, we show that From (16) and (19) the limiting distribution of the bootstrap bias corrected estimator,~ nT , which is de…ned in (9), will be invariant even though we replace^ nT with nT;k . Hence, we can de…ne our truncated k-step bootstrap biased corrected estimator as Then for T = 3 p n ! 1 and all k 2,

Asymptotic Properties
In this section, we state the assumptions and rigorously establish the asymptotic properties of the standard bootstrap and k-step bootstrap estimators. For easy of exposition, we write l( ; ; z it ) l( ; i ; z it ) so that l ( ; ; z it ) is regarded as a function of and : We maintain the following assumptions: Assumption 1 n; T ! 1 such that n = o(T 3 ) and T = O(n): (iii) For some Q > 64; E M (z it ) Q < C for a constant C and all i = 1; 2; :::; N: Assumption 1 shows that our estimator is applicable as long as T grows faster than 3 p n. This implies that our asymptotic theory is valid with relatively small T and large n, which is often the case in micro panel data sets. Assumption 2 is a standard regularity assumption. Assumption 3 is the identi…cation assumption for extremum estimators. Assumption 4 is the same as Condition 4 in Newey and . It is stronger than the moment assumption for extremum estimators and under this assumption the asymptotic bias depends on the second order expansion and higher order terms go to 0 under Assumption 1. Assumption 5 allows us to invoke the central limit theorem. Assumption 6 ensures that the limiting bias term B is close to its …nite sample analogue B n . This assumption holds trivially if z it are iid across i: For the proof see Appendix I.
Proposition 2 Under Assumptions 1-6, for all k 1 For the proof see Appendix II.
Theorem 3 Under Assumptions 1-6, for all k 2 For the proof see Appendix III.

Bias Correction for Average Marginal E¤ects
In this section, we suggest bias corrected estimators of the average marginal e¤ects using the k-step bootstrap procedure. In nonlinear models, the average marginal e¤ect may be as interesting as the model parameters because it summarizes the e¤ect over certain sub-population, which is often the quantity of interest in empirical studies. The …rst average marginal e¤ect, which we refer to as "the …xed e¤ect average" or simply the average marginal e¤ect, is the marginal e¤ect averaged over i : It is de…ned as where w is the value of the covariate vector where the average e¤ect is desired. For example, in a probit model, m(w; 0 ; i0 ) = 0(j) (x 0 0 + i0 ) where 0(j) and ( ) are the coe¢ cient on the j-th regressor of interest and the standard normal density function respectively. The bias uncorrected estimator of (w) iŝ As in the case for the estimation of model parameters, we can construct a k-step bootstrap bias corrected estimator of the …xed e¤ect average by estimating the bias with the di¤erence between^ nT (w) and its bootstrap estimator. Our k-step bootstrap bias corrected estimator of the …xed e¤ects average is The second average marginal e¤ect, which we refer to as "the overall average marginal e¤ect", is the marginal e¤ect averaged over both i and the covariates. It is de…ned as See also Fernández-Val (2009). Similarly to equations (22) and (23), the original and bias corrected estimators of are^

Monte Carlo Study
In this section, we report our Monte Carlo experiment results, which show that k-step bootstrap bias correction reduces the bias signi…cantly in …nite samples and also improves the coverage accuracy of CI's. For our Monte Carlo experiment, we employ the design used in Heckman (1981), Greene (2004), HN, and Fernández-Val(2009). It is based on the following probit model: u it U ( 1=2; 1=2); n = 100; T = 4; 8; 12; 0 = 1: As discussed in HN, this model does not …t completely within our framework. First, X it is correlated overtime. The correlation does not cause any problem as we can use the conditional MLE approach and all the asymptotic results remain valid. Second, there is no correlation between X it and i . This is di¤erent from the usual condition under which the …xed e¤ects estimator is used. However the incidental parameters problem is still present as it has nothing to do with whether there is a correlation between X it and i : The bias of the …xed e¤ects estimator can be severe for …xed e¤ects models as well as for random e¤ects models. The e¤ectiveness of di¤erent bias reduction methods can be well evaluated with our data generating process. Another reason to use this design is that it is widely cited and used in other simulation studies, which helps us compare our estimator with the alternatives.
The uncorrected estimator of model parameters is and the estimators of the average marginal e¤ects arê where ( ) is the standard normal distribution function and x is the sample mean of fx it ; i = 1; 2; :::; N; t = 1; :::; T g. For the k-step bootstrap, we generate bootstrap samples based on^ nT and f^ i g n i=1 and estimate^ nT;k using (14) with the bootstrap samples. We repeat this procedure 1000 times (R = 1000). Then, we obtain the bias corrected k-step bootstrap estimator from (20). As discussed before, for each k value, we can use either observed Hessian or expected Hessian in the NR step, leading to two versions of the k-step procedure. Each simulation is repeated 1000 times.
We compare the performance of our bias-corrected estimator with four alternative bias correction estimators: the jackknife and the analytic bias corrected estimators by  and the analytical bias-corrected estimator by Fernández-Val (2009). The jackknife bias-corrected estimator is denoted 'Jackknife'. For HN analytic estimators, there are two versions: the analytic bias-corrected estimator using Bartlett equalities, denoted 'BC1'; the analytic bias-corrected estimator based on general estimating equations, denoted 'BC2'. Fernández-Val's estimator is denoted as 'BC3'.
For each estimator, we report its mean, median, standard deviation, root mean squared errors, and the empirical sizes of two-sided nominal 5% and 10% tests. The tests are based on symmetric CI's, that is, we reject the null hypothesis if the parameter value under the null falls outside the CI's. For the jackknife and analytical bias correction procedures, the interval estimator or the testing method are the same as that given in the respective papers. For the k-step procedure, the CI's are based the double bootstrap procedure.
To describe the double bootstrap procedure, we focus on an element of : Hence, without loss of generality, we can consider the case that 0 is a scalar. By iterating the bootstrap procedure, we de…ne:~ nT;k 2 nT;k E ( nT;k ); where E ( nT;k ) is de…ned on the double bootstrap, that is, the k-step bootstrap using ( nT;k ; i;k ) as the true model parameters. Similarlỹ Then the bootstrap CI is : where T 1 =2 is the (1 =2) 100% percentile of t -stat. Our double bootstrap twosided test is based on the above CI. We can use the same procedure to construct CIs for the average marginal e¤ect and the overall average marginal e¤ect. In our simulation experiment, we set the number of double bootstrap samples to be 100. We do so in order to reduce the computational burden. In empirical applications, we should use a larger number. Table 1 shows the performance of the k-step bootstrap for di¤erent values of k. According to this result, the k-step bootstrap procedure reduces the bias signi…cantly when k 2. Results not reported here show that the one-step procedure is not e¤ective in bias reduction. This result is consistent with our Theorem 2, which demonstrates the order of bias is reduced from O p (1=T ) to O p 1=T 2 when k 2. In terms of the MSE, the 2-step bootstrap with observed Hessian, the 3-step bootstrap with observed Hessian and the 3-step bootstrap with expected Hessian are e¢ cient in general. Table 2 compares di¤erent bias correction methods. We choose the 2-step bootstrap with observed Hessian as our benchmark. First, we see that the estimator without bias correction is severely biased when T is small. As T gets larger, the bias gets smaller, but there is still no improvement in the coverage accuracy of CI's. When T = 4, the bias of the uncorrected estimator is 42%. When T = 12, the bias is reduced to 13% . But the rejection probability is still 29% for the 5% two-sided test. Second, the k-step bootstrap performs better in …nite samples than the other methods regardless of the size of T . In particular when T = 4, the outperforming of the k-step bootstrap procedure is remarkable, while as T increases other estimators become as accurate as ours. When T = 4 the bias of our estimator is 6% and RMSE is 0.249, while the bias of the jackknife method is 25% and its RMSE is 0.373. The analytic method by Fernández-Val (2009) also has the bias of 6% but its RMSE is 0.281 which implies that its variance is larger than ours. The k-step procedure achieves the smallest RMSE among all bias correction procedures. Third, in term of coverage accuracy, the CI's based on the double k-step bootstrap outperform other CIs in an overall sense. Table 3 shows the ratio of the estimator of the average marginal e¤ect to the true value. As HN and Fernández-Val (2009) show, the bias of the uncorrected estimator is negligible, even when T = 4. Its bias is less than 2% and in terms of RMSE, it performs as good as the bias corrected ones. However its CI's are not accurate especially when we have small T . When T = 4, its error in coverage probability for the 95% CI is about 5%. Inaccurate CI's are not just the problem of the bias uncorrected estimator. Jackknife and the analytic bias correction do not reduce the coverage error either. When T = 4, the errors in coverage probability for the 95% CI from jackknife and analytic estimators are 11% and 4-6% respectively. In contrast, the coverage error of the 95% CI constructed from the k-step double bootstrap is only 1.5%. We …nd that the our estimator improves the accuracy of the CI's which is not the case in the other standard alternatives. Table 4 gives the Monte Carlo results for the ratio of the estimator of the overall average marginal e¤ect to the true value. This is similar to the …xed e¤ect average in Table 3 except that the average is taken over both the …xed e¤ects and the covariate. As in the previous case, we …nd little evidence that bias correction is necessary in terms of RMSE. Actually, the RMSE of the bias uncorrected estimator is smaller than that of the jackknife estimator in general. It also shows that in contrast to other estimators our double k-step bootstrap procedure improves the coverage accuracy of the CI's particularly when T is small.

Conclusion
In this paper, we propose the k-step bootstrap bias correction for the …xed e¤ects estimator in nonlinear static panel models and establish the asymptotic properties of our bias corrected estimator. In simulation experiments, we show that the k-step bias correction procedure is often more e¤ective than the alternatives. When T is small, the procedure achieves substantial bias reduction and has the smallest RMSE among the competing procedures. The con…dence interval based on the double k-step bootstrap has a smaller coverage error than other CI's. This is true for both model parameters and average marginal e¤ects. The asymptotic properties of our CIs and the possible higher order re…nement are not studied here. It is an interesting topic for future research.

Appendix I. Proof of Theorem 1
Throughout the proof, we assume that we have truncated^ nT so that^ nT ^ nT is bounded in absolute value by M nT = p nT : The technical details for showing that truncation has a negligible e¤ect on the asymptotic properties of p nT (~ nT 0 ) are the same as those for p nT (~ nT;k 0 ): We present the details for the latter in Appendix III and omit them here. Truncation allows us to convert probability orders into moment orders.
The bootstrap bias corrected estimator is de…ned as The LHS of (24) can be decomposed into two parts : Hence, it su¢ ces to show Let F (F 1 ; : : : ; F n ) andF (F 1 ; : : : For each …xed and , let i ( ; F i ( )) and (F ( )) be the solutions to the estimating equations Z U i ( (F ( )); i ( (F ( )); F i ( ))) dF i ( ): A Taylor series expansion giveŝ where ( ) d (F ( ))=d , ( ) d 2 (F ( ))=d 2 ; : : : ; and~ is between 0 and 1= p T . HN show that where Similarly, in the bootstrap world, for each …xed and , let i ( ; F i ( )) and (F ( )) be the solutions to the estimating equations is the distribution function of stratum i in the bootstrap sample andF i is the corresponding empirical distribution. Note that F i (0) is the same as F i = F 0 ; i0 except that the true parameter is (^ nT ;^ i (^ nT )) rather than ( 0 ; i0 ).
Similar to equation (27), in the bootstrap world, we havê where^ ( ) d^ (F ( ))=d ,^ ( ) d 2^ (F ( ))=d 2 ; : : : ; and~ is between 0 and 1= p T . Also, Using the same argument as in HN and with some calculations, we have, under Assumption 4(ii): where To evaluate the stochastic order of A nT and B nT ; we use the following result: uniformly over i = 1; 2; :::; n where This result is given in HN and follows from the standard higher order expansion. For A nT ; suppose that 1 is a parameter value between 0 and^ nT and i1 is a value between i0 and^ i . Then using Assumptions 1 and 4. Next using LLN and CLT. We have thus proved For B nT , suppose that 2 is between 0 and^ nT and that i2 is between i0 and^ i . Then and using the argument similar to (35). Therefore, Combining (36) and (39) yields: Using the same procedure, we can show that Therefore, from (40) and (41), That is completing the proof of (25).

Prove
The second equality holds by the dominated convergence theorem and the last equality follows from an argument similar to HN. Therefore, it su¢ ces to show that Equation (43) holds because where the O p ( ) terms come from O p ( ) in (30) and the leading term in (45) is using Assumption 6. In the above equation, we use the subscript ; i on E ; i to emphasize that it is the expectation under F ; i : This completes the proof of Theorem 1.

II. Proof of Proposition 2
It is easy to show that P lim and P lim n!1 By the de…nition of the k-step bootstrap estimator: For notational compactness, let = ( 0 ; 0 ) 0 ;^ = (^ 0 nT ;^ 0 ) 0 ;^ = (^ 0 nT ;^ 0 ) 0 : Then for^ 0 =^ : Using a Taylor expansion and the …rst order condition: where^ y k 1 lies between^ and^ k 1 , = 1 ; :::; u ; :::; L is a vector with u-th element A more explicit expression for u is where k k is the Euclidean norm, that is, for a symmetric matrix A; kAk 2 = trace(AA 0 ): So where For k = 1; we have The …rst probability in (52) using (46) : We proceed to bound P P lim n!1 ;k 1 C 2 : By de…nition, where and B = P P lim n!1 n 1 H(^ k 1 ; z it 1 p C 2 : Note that^ y k 1 is between^ and^ and Using a uniform law of large numbers under the probability measure P and the dominated convergence theorem, we have where the last equality follows because P conditional on the data (i.e.^ nT ;^ i ) is the same as P but with di¤erent model parameters. Hence, which can be made arbitrarily small if we choose a large C 2 : Using the same argument, we can show that, when C 2 is large enough, B = o(1) as n and T go to 1. We have therefore proved P P lim n!1 nT when C 2 is large enough.
To show that the second probability in (52) is o (1) ; it su¢ ces to prove where the last equality follows from the ULLN. Combining (53) with (58), we have, for k = 1: when C is large enough. That is, when k = 1 P lim n!1^ k = P lim n!1^ For k 2; we note that Using the recursive relationship repeatedly, we have, for k 2; where = P k j=2 2 j 1 : Using a similar argument, we can show that P lim n!1 nT = O p (1) : Combining this with (59) and (60)

III. Proof of Theorem 3
De…ne our truncated k-step bootstrap bias corrected estimator to be: