A New Asymptotic Theory for Vector Autoregressive Long-Run Variance Estimation and Autocorrelation Robust Testing

In this paper, we develop a new asymptotic theory of the long run variance estimator obtained by fitting a vector autoregressive model to the transformed moment processes in a GMM framework. In contrast to the conventional asymptotics where the VAR lag order p goes to infinity but at a slower rate than the sample size, we assume that p grows at the same rate as the sample size. Under this asymptotic specification, the long run variance estimator is not consistent, but the associated Wald statistic and t-statistic remain asymptotically pivotal. On the basis of the new asymptotic theory, we introduce a new and easy-to-use F* test that employs a finite sample corrected Wald statistic and uses critical values from an F-distribution. We also propose an empirical VAR order selection rule that exploits the connection between VAR long run variance estimation and kernel long run variance estimation. Simulations show that the new VAR F* test with the empirical order selection is more accurate in size than the conventional chi-square test and kernel-based F* with no or minimal power loss. The paper complements the recent paper by Sun (2010d) who considers kernel-based F* tests.


Introduction
The paper considers parameter inference in GMM models with time series data. To avoid possible mis-speci…cation and to be completely general, we often do not parametrize the dependent structure of the moment conditions. The problem is how to nonparametrically estimate the long run variance (LRV) matrix of these moment conditions. The recent literature on heteroscedasticity and autocorrelation robust (HAR) estimators has mainly focused on the kernel-based method, although quite di¤erent approaches like the vector autoregressive (VAR) approach (see, for example, Berk, 1974, Parzen, 1983, den Haan and Levin, 1998) and series type approach (Phillips, 2005, Sun, 2010a,b,c) have been explored. Under fairly general conditions, den Haan and Levin (1997,1998) show that the vector autoregressive HAR variance estimator converges at a faster rate than any positive semi-de…nite kernel HAR variance estimator. This faster rate of convergence may lead to a chi-square test with good size and power properties. However, Monte Carlo simulations in den Haan and Levin (1998) show that the …nite sample performance of the chi-square test based on the VAR variance estimator is unsatisfactory, especially when there is strong autocorrelation in the data.
The key asymptotic result underlying the chi-square test is the consistency of the VAR variance estimator. The consistency result requires that the VAR order p increases with the sample size T but at a slower rate. While appealing in practical situations, the consistency result does not capture the sampling variation of the variance estimator in …nite samples. In addition, the consistency result also completely ignores the estimation uncertainty of the model parameters. In this paper, we develop a new asymptotic theory that avoids these two drawbacks. The main idea is to view the VAR order p as proportional to the sample size T . That is, p = bT for some …xed constant b 2 (0; 1). Under this new statistical thought experiment, the VAR variance estimator is inconsistent but converges in distribution to a random matrix that depends on the VAR order and the estimation error of model parameters. Furthermore, the random matrix is proportional to the true long run variance. As a result, the associated test statistic is still asymptotically pivotal under this new asymptotics. More importantly, the new asymptotic distribution captures the sampling variation of the variance estimator and is likely to provide a more accurate approximation than the conventional chi-square approximation.
To develop the new asymptotic theory, we observe that the VAR(p) model estimated by the Yule-Walker method has conditional population autocovariances (conditional on the estimated model parameters) that are identical to the empirical autocovariances up to order p. This crucial observation drives all of our asymptotic development. Given this 'reproducing'property of the Yule-Walker estimator, we know that the VAR variance estimator is asymptotically equivalent to the kernel LRV estimator based on the rectangular kernel with bandwidth equal to p: The speci…cation of p = bT is then the same as the socalled …xed-b speci…cation in Kiefer and Vogelsang (2005) and Sun, Phillips and Jin (2008). The rectangular kernel is not continuous and has not been considered in the literature on the …xed-b asymptotics. One of the contributions of this paper is to …ll in this gap. As a corollary, we obtain the new asymptotic theory for the VAR variance estimator.
The new asymptotics obtained under the speci…cation that p = bT for a …xed b may be referred to as the …xed-smoothing asymptotics as the asymptotically equivalent kernel estimator has a …nite and thus …xed e¤ective degree of freedom. On the other hand, when b ! 0; the e¤ective degree of freedom increases with the sample size. The conventional asymptotics obtained under the speci…cation that p ! 1 but b ! 0 may be referred to as the increasing-smoothing asymptotics. The two speci…cations can be viewed as di¤erent asymptotic devices to obtain approximations to the …nite sample distribution. The …xedsmoothing asymptotics does not necessarily require that we …x the value of b in …nite samples. In fact, in empirical applications, the sample size T is usually given beforehand and the VAR order needs to be determined using a priori information and/or information obtained from the data. Very often, the selected VAR order is larger for a larger sample size but is still small relative to the sample size. So the empirical situations appear to be more compatible with the conventional asymptotics. Fortunately, we can show that the two types of asymptotics coincide as b ! 0. In other words, the …xed-smoothing asymptotics is asymptotically valid under the conventional thought experiment.
A further contribution of the paper is to provide an approximation to the nonstandard …xed-smoothing asymptotic distribution of the Wald statistic. We show that the …xedsmoothing asymptotic distribution of the VAR variance estimator is equal to a weighted sum of independent Wishart distributions. In addition, the VAR variance estimator is asymptotically independent of the model parameter estimator. Motivated from the early statistics literature on spectral density estimation, we approximate the weighted sum of Wishart distributions by a single Wishart distribution with equivalent degree of freedom. A direct implication is that the limiting distribution of the Wald statistic is approximately equal to a quadratic form in standard normal variates with an independent and Wishartdistributed weighting matrix. But such a quadratic form is exactly the same as Hotelling's (1931) T 2 distribution. Using the well-known relationship between the T 2 distribution and the F distribution, we show that, after some modi…cation, the nonstandard …xed-smoothing limiting distribution can be approximated by a standard F distribution.
On the basis of the F -approximation, we propose a new F test. The F test statistic is equal to the Wald statistic multiplied by a …nite sample correction factor exp ( 2bq) or an asymptotically equivalent factor, where q is the number of restrictions being jointly tested. The …nite sample correction can be regarded as an example of the Bartlett or Bartletttype correction. See Bartlett (1937Bartlett ( , 1954. It corrects for the demeaning bias of the VAR variance estimator, which is due to the estimation uncertainty of model parameters, and the randomness of the scaling matrix in the Wald statistic. The critical value used in the F test comes from the F distribution with degree of freedom q and K = max(5;[1= (2b)]). Compared with the standard 2 critical values, the F critical values capture the randomness of the LRV estimator. The F test is as easy to use as the standard Wald test as both the correction factor and the critical values are easy to obtain.
The connection between the autoregressive spectrum estimator and the kernel spectrum estimator with the rectangular kernel does not seem to be fully explored in the literature. First, the asymptotic equivalence of these two estimators can be used to prove the consistency and asymptotic normality of the autoregressive estimator as the asymptotic properties of the kernel estimator have been well researched in the literature. Second, the connection sheds some light on the faster rate of convergence of the autoregressive spectrum estimator and the kernel spectrum estimator based on ‡at-top kernels. The general class of ‡at-top kernels, proposed by Politis (2001), includes the rectangular kernel as a special case. Under the conventional asymptotics, Politis (2010, Theorem 2.1) establishes the rate of convergence of ‡at-top kernel estimators, while den Haan and Levin (1998, Theorem 1) gives the rate for the vector autoregressive estimator. See Lee and Phillips (1994) for related results on the ARMA prewhitened estimator. Careful inspection shows that the rates in Politis (2010) are the same as those in den Haan and Levin (1998), although the routes to them are completely di¤erent. In view of the asymptotic equivalence, the identical rates of convergence are not surprising at all. Finally, the present paper gives another example that takes advantage of this connection.
Compared with a …nite-order kernel estimator, the VAR variance estimator enjoys the bias reducing property same as any in…nite-order ‡at-top kernel estimator. The small bias, coupling with the new asymptotic theory that captures the randomness the VAR variance estimator, may give the proposed F test some size advantage. This is con…rmed in the Monte Carlo experiments. Simulation results indicate that the size of the VAR F test with a new empirically determined VAR order is as accurate as, and often more accurate than, the kernel-based F tests recently proposed by Sun (2010d). The VAR F test is uniformly more accurate in size than the conventional chi-square test. The power of the VAR F test is also very competitive relative to the kernel-based F test and the 2 test.
The rest of the paper is organized as follows. Section 2 presents the GMM model and the testing problem. It also provides an overview of the VAR variance estimator. The next two sections are devoted to the …xed-smoothing asymptotics of the VAR variance estimator and the associated test statistic. Section 5 details a new method for lag order determination, and section 6 reports simulation evidence. The last section provides some concluding discussion. Proofs and some technical results are given in the appendix.

GMM Estimation and Autocorrelation Robust Testing
We are interested in a d 1 vector of parameters 2 R d . Let v t denote a vector of observations. Let 0 be the true value and assume that 0 is an interior point of the compact parameter space : The moment conditions the GMM estimator (Hansen, 1982) of 0 is then given bŷ where W T is an m m positive de…nite weighting matrix. Let Under some regularity conditions,^ T satis…eŝ The above asymptotic result provides the basis for inference on 0 : Consider the null hypothesis H 0 : r( 0 ) = 0 and the alternative hypothesis H 1 : r ( 0 ) 6 = 0 where r ( ) is a q 1 vector of continuously di¤erentiable functions with …rst-order derivative matrix R( ) = @r( )=@ 0 : Denote R = R( 0 ): The F -test version of the Wald statistic for testing H 0 against H 1 is whereV R is an estimator of the asymptotic variance V R of R p T (^ T 0 ): When r ( ) is a scalar function, we can construct t-statistic as It follows from (1) that V R = RVR 0 : To make inference on 0 ; we have to estimate the unknown quantities in V: W and G can be consistently estimated by their …nite sample versions W T andĜ T = G T (^ T ), respectively. It remains to estimate : Let^ T be an estimator of : Then V R can be estimated bŷ Many nonparametric estimators of are available in the literature. The most popular ones are kernel estimators, which are based on the early statistical literature on spectral density estimation. See Priestley (1981). Andrews (1991) and Newey andWest (1987, 1994) extend earlier results to econometric models where the LRV estimation is based on estimated processes. In this paper, we follow den Haan and Levin (1997,1998) and consider estimating the LRV by vector autoregression. The autoregression approach can be traced back to Whittle (1954). Berk (1974) provides the …rst proof of the consistency of autoregressive spectral estimators. Let be the transformed moment conditions based on the estimator^ T : Note that h t is a vector process of dimension q: We outline the steps involved in the VAR variance estimation below.
1. Fit a VAR(p) model to the estimated process h t using the Yule-Walker method (see, for example, Lütkepohl (2007)): h t =Â 1 h t 1 + : : : +Â p h t p +ê t ; whereÂ 1 ; : : : ;Â p are estimated autoregression coe¢ cients andê t is the …tted residual. More speci…cally,Â = Â 1 ; : : : ;Â p = [^ h (1) ; : : : It is important to point out that we …t a VAR(p) model to the transformed moment condition h t instead of the original moment condition f (v t ;^ T ): There are several advantages. First, the dimension of h t can be much smaller than the dimension of f (v t ;^ T ); especially when there are many moment conditions. So the VAR(p) model for h t may have substantially fewer parameter than the VAR model for f (v t ;^ T ): Second, by construction P T t=1 h t = 0; so an intercept vector is not needed in the VAR for h t : On the other hand, when the model is overidenti…ed, Hence, a VAR model for f (v t ;^ T ) should contain an intercept. Finally and more importantly, h t is tailored to the null hypothesis under consideration. The VAR order we select will re ‡ect the null directly. In contrast, autoregression …tting on the basis of f (v t ;^ T ) completely ignores the null hypothesis, and the resulting variance estimatorV R may be poor in …nite samples. LetÂ then the Yule-Walker estimatorsÂ and^ E satisfy: It is well known that the estimated VAR model obtained via the Yule-Walker method is stationary almost surely. See Brockwell and Davis (1987, ch 8.1), Lütkepohl (2007, ch 3.3.4), and Reinsel (1993, ch 4.4). These books discuss either the scalar case or the multivariate case but without giving a proof. To the best of author's knowledge, a rigorous proof for the multivariate case is currently lacking in the literature. We collect the stationarity result in the proposition below and provide a simple proof in the appendix. where Â is any eigenvalue ofÂ: Proposition 1 gives precise conditions under which the …tted VAR(p) process is stationary. For the OLS estimator, the X 0 X matrix is not a Toeplitz matrix and the …tted VAR(p) model may not be stationary.

Fixed-Smoothing Asymptotics for the Variance Estimator
In this section, we derive the asymptotic distribution ofV R : Depending on how the VAR order p and the sample size T go to in…nity, there are several di¤erent types of asymptotics. When the VAR order is set equal to a …xed proportion of the sample size, i.e. p = bT for a …xed constant b 2 (0; 1); we obtain the so-called …xed-smoothing asymptotics. In this case,V R is asymptotically equivalent to a kernel smoothing estimator where the smoothing is over a …xed number of quantities, even in large samples. On the other hand, if b ! 0 at the rate given in den Haan and Levin (1998), we obtain the conventional asymptotics under whichV R is consistent and F T is asymptotically 2 q =q distributed. In this case, the smoothing is taken over increasingly many quantities and the asymptotics may be referred to as the increasing-smoothing asymptotics. Under this type of asymptotics, b ! 0 and T ! 1 jointly. So the increasing-smoothing asymptotics is a type of joint asymptotics. An intermediate case is obtained when we let T ! 1 for a …xed b followed by letting b ! 0: Given the sequential nature of the limiting behavior of b and T; we call the intermediate case the sequential asymptotics.
An important property of the Yule-Walker estimator is that conditional onÂ 1 ; : : : ;Â p and^ e ; the …tted VAR(p) process has a theoretical autocovariance that is identical to the sample autocovariance up to lag p: To see this, consider a generic VAR(p) processh t ; h t = A 1ht 1 + : : : + A pht p +ẽ t ;  where h (j) = Eh th 0 t j : Then the autocovariance sequence satis…es where A and E are de…ned similarly asÂ and^ E . It follows that That is, when I p 2 q 2 (A A) is invertible, we can represent the autocovariances of fh t g as a function of A 1 ; : : : ; A p and e : h (j) h;j (A 1 ; : : : ; A p ; e ) ; j = 0; 1; : : : ; p: By the de…nition of the Yule-Walker estimator,Â and^ E satisfy^ H (p) =Â^ H (p)Â 0 + E : Comparing this with the theoretical autocovariance sequence in (4) and in view of (5), we have^ h (j) = h;j Â 1 ; : : : ;Â p ;^ e ; j = 0; 1; : : : ; p; provided that I p 2 q 2 Â Â is invertible. In other words, conditional onÂ 1 ; : : : ;Â p ;^ e ; the autocovariances of the …tted VAR(p) process match exactly with the empirical autocovariances used in constructing the Yule-Walker estimator.
Using the 'reproducing'property of the Yule-Walker estimator, we can relate the VAR variance estimator to the kernel estimator of V R based on the rectangular kernel. Let k rect (r) = 1 fjrj 1g and k rect;b (r) = 1 fjrj bg ; then the rectangular kernel estimator of where h t is de…ned in (2) and p is the bandwidth or truncation lag. By de…nition, Intuitively, the …tted VAR process necessarily agrees exactly up to lag p with the estimated autocovariances. The values of the autocovariances after lag p are generated recursively in accordance with the VAR(p) model as in (6). The di¤erence between the VAR variance estimator and the rectangular kernel LRV estimator is that for the former estimator the autocovariances of order greater than p are based on the VAR(p) extrapolation while for the latter estimator these autocovariances are assumed to be zero.
Using the relationship between the VAR variance estimator and the rectangular kernel LRV estimator, we can establish the asymptotic distribution of the VAR variance estimator under the …xed-smoothing asymptotics. We make the following assumptions.
Assumption 1 is made for convenience. It can be proved under more primitive assumptions and using standard arguments. Assumptions 2 and 3 are similar to those in Kiefer and Vogelsang (2005), and Sun (2010c,d). Assumption 2 regulates ff (v t ; 0 )g to obey a functional central limit theorem (FCLT) while Assumption 3 requires f@f (v j ; 0 )=@ 0 g satisfying a uniform weak law of large numbers (WLLN). Note that FCLT and WLLN hold for serially correlated and heterogeneously distributed data that satisfy certain regularity conditions on moments and the dependence structure over time. These primitive regularity conditions are quite technical and can be found in White (2001), among others. Assumption 4 is a high-level condition. Using the same argument as in Hansen (1992) and de Jong and Davidson (2000), we can show that under some moment and mixing conditions on the process ff where we have assumed the stationarity of ff (v t ; 0 )g and the absolute summability of its autocovariances. Hence Assumption 4 holds under some regularity conditions.
Lemma 1 Let Assumptions 1-5 hold. Then under the …xed-smoothing asymptotics, and is the standard Brownian Bridge process.
It follows from the above lemma thatV R ) V R;1 : That is, under the …xed-smoothing asymptotics,V R converges to a random matrix V R;1 . Note that V R;1 is proportional to the true variance matrix V R through R (G 0 WG) 1 G 0 W and its transpose. This contrasts with the increasing-smoothing asymptotic approximation whereṼ R is approximated by a constant V R . The advantage of the …xed-smoothing asymptotic result is that the limit ofV R depends on the order of the autoregression through b but is otherwise nuisance parameter free. Therefore, it is possible to obtain a …rst-order asymptotic distribution theory that explicitly captures the e¤ect of the VAR order used in constructing the VAR variance estimator.
The following lemma gives an alternative representation of Q m (b). Using this representation, we can compute the variance of V R;1 : The representation uses the following de…nition: and K m 2 is the m 2 m 2 commutation matrix.
The transformed kernel function k b (r; s) is graphed in Figure 1 for the case b = 0:2: In the center of the (r; s) domain, the function is ‡at and is equal to b 2 2b + 1: The edge e¤ects, as clearly seen from the …gure, ensure that for any r; s: That is, the function k b (r; s) integrates to zero along each coordinate. As b becomes closer to zero, the function becomes more concentrated around the line r = s: In the limit The convergence of V R;1 to V R as b ! 0 also follows from the mean and variance calculations. The lemma shows that the mean of V R;1 is proportional to the true variance V R : When b ! 0, we have (1 b) 2 ! 1 and 2 (b) ! 0: In other words, as b ! 0; the mean of V R;1 converges to V R and the variance converges to zero. A direct implication is that plim b!0 V R;1 = V R : We can conclude that as b goes to zero, the …xed-smoothing asymptotics coincides with the conventional increasing-smoothing asymptotics. More precisely, the probability limits ofV R are the same under the sequential asymptotics and the joint asymptotics. Figure 2 presents the function 2 (b)=b; the multiplicative factor in b 1 var(vec(V R;1 )): It is clear that this function is decreasing in b: The right hand side is exactly the asymptotic variance one would obtain under the joint asymptotic theory. Therefore,V R has not only the same probability limit but also the same asymptotic variance under the sequential and joint asymptotics. Note that lim b!1 b 1 var(vec(V R;1 )) = 0 and lim b!1 EV R;1 = 0: As a result, plim b!1 V R;1 = 0: This is not surprising, as when b = 1, by the …rst-order condition for the GMM estimator.
The asymptotics bias arises from the estimation uncertainty of model parameter : It may be called the demeaning bias as the stochastic integral in (7) depends on the Brownian bridge process rather than the Brownian motion process. One advantage of the …xed-smoothing asymptotics is its ability to capture the demeaning bias. In contrast, under the conventional increasing-smoothing asymptotics, the estimation uncertainty of is negligible. As a result, the …rst-order conventional asymptotics does not re ‡ect the demeaning bias.

Fixed-Smoothing Asymptotics for the Wald Test
In this section, we …rst establish the asymptotic distribution of F T under the …xed-smoothing asymptotics. We then develop an F -approximation to the nonstandard limiting distribution. The following theorem can be proved using Lemmas 1 and 2.
Theorem 2 Let Assumptions 1-5 hold. Under the …xed-smoothing asymptotics where b is held …xed, we have Theorem 2 is analogous to Theorem 3 in Kiefer and Vogelsang (2005). It shows that F T depends on b but otherwise is nuisance parameter free. We have thus obtained asymptotically pivotal tests that re ‡ect the choice of the VAR order. This is in contrast with the asymptotic results under the standard approach where F T would have a limiting 2 q =q distribution and t T a limiting N (0; 1) distribution regardless of the choice of b: Hence, when b ! 0; the …xed-smoothing asymptotics approaches the standard increasing-smoothing asymptotics. In other words, the sequential limit of F T is identical to the joint limit of F T : Compared with the …xed-smoothing asymptotics, the sequential asymptotics invokes an additional approximation, which could lead to additional approximation error. In a sequence of papers, Sun (2010a,b,c,d) and Sun, Phillips and Jin (2008) show that critical values from the …xedsmoothing asymptotics are high-order correct under the conventional joint asymptotics.
The asymptotic distribution F 1 (q; b) or t 1 (b) is nonstandard. The critical values are not readily available from statistical tables or software packages. For this reason, we proceed to approximate the critical values in closed form. Since k b (r; s) 2 L 2 ([0; 1] [0; 1]) ; it has a Fourier series representation: where f `( r) n (s)g is an orthonormal basis for L 2 ([0; 1] [0; 1]) and the convergence is in the L 2 space. For example, we may take `( r) = (1= p 2) cos 2` r or (1= p 2) sin 2` r. Since `( r) dr = 0 for`= 1; 2; : : : : In addition, by the symmetry of k b (r; s); we can deduce that `n = n`: Using the Fourier series representation, we can write The above equality holds because ; then is independent of n and To simplify the above representation, we note that where N q = ( 1 ; : : : where := ( 1 ; : : : It is easy to see that i s iidN (0; I q ): By de…nition, i 0 i follows a Wishart distribution W q (I q ; 1); so i is an in…nite weighted sum of independent Wishart distributions. Using the representation in (9), we have where i s iidN (0; I q ), s N (0; I q ) and i is independent of for all i: That is, F 1 (q; b) is equal in distribution to a quadratic form in standard normal variates with an independent and random weighting matrix.
We want to approximate the distribution of by a scaled Wishart distribution W q (I q ; K)=K for some integer K > 0. We select K to match their …rst two moments. Let s W q (I q ; K)=K: By construction E = E = I q : For any symmetric matrix D; we have See Example 7.1 in Bilodeau and Brenner (1999). Using this result, we can show that In view of (10) and (11), we can set where dxe denotes the smallest integer greater than x and That is, d W q (I q ; K)=K for the above K value where d denotes "is approximately equal to in distribution." As a result But 0 1 is the Hotelling's T 2 (q; K q + 1) distribution (Hotelling, 1931). By the well-known relationship between the F distribution and the T -square distribution, we have The following theorem makes the above approximation more precise.
Theorem 3 LetF 1 (q; b) be the corrected limiting variate de…ned bỹ The parameter K can be called the "equivalent degree of freedom"(EDF) of the LRV estimator. The idea of approximating a weighted sum of independent Wishart distributions by a simple Wishart distribution with equivalent degree of freedom can be motivated from the early statistical literature on spectral density estimation. In the scalar case, the distribution of the spectral density estimator is often approximated by a chi-square distribution with equivalent degree of freedom; see Priestley (1981, p. 467). In addition, Hall (1983) suggests approximating the distribution of a sum of independent scalar random variables by a chi-square distribution instead of a normal distribution. The advantage of using a chi-square distribution is that it can pick up any positive skewness component in the true distribution. Kollo and von Rosen (1995) extend the idea to multivariate cases and use a Wishart distribution as the approximating distribution. This is exactly the approach we employ here.
When b ! 0; the EDF is equal to 1=(2b) up to smaller order terms where the asymptotic variance of the LRV estimator is proportional to 2b: Figure 3 graphs the function J(b); which provides high-order adjustment to the …rst-order EDF 1=(2b): It is clear that as b decreases, i.e. as the degree of smoothing increases, the EDF increases and the variance decreases. In other words, the higher the EDF is, the larger the degree of smoothing is, and the smaller the variance is.
It is easy to see that Theorem 3 remains valid if the correction factor 1 (K q + 1) =K is replaced by its asymptotically equivalent form. More speci…cally, as b ! 0; In addition Combining the above two equations and Theorem 3, we obtain the corollary below.
Corollary 4 Let F 1 (q; b) be the corrected limiting variate de…ned by where = 1 1 + 2qb or exp( 2qb): where K = max(d(2b) 1 e; 5): In the above corollary, we modify the second degree of freedom in the F approximation. The reason is that the variance of an F distribution exists only if its second degree of freedom is larger than 4. Obviously, the modi…cation does not have any impact on the asymptotic result, as K = d(2b) 1 e when b ! 0: Let F q;K and F 1 (q; b) be the 1 quantiles of the standard F q;K distribution and the nonstandard F 1 (q; b) distribution, respectively. Then That is, In other words, So for the original F statistic, we can use as the critical value for the test with nominal size : As an approximation to the nonstandard critical value, the critical value F q;b is secondorder correct as the approximation error in (18) is of smaller order o(b) rather than O(b) as b ! 0: The second-order critical value is larger than the standard critical value from 2 q =q for two reasons. First, F q;K is larger than the corresponding critical value from 2 q =q due to the presence of a random denominator in the F distribution. Second, the correction factor 1 is larger than 1: As b increases, both the correction factor and F critical value F q;K increase. As a result, the second-order critical value F q;b is an increasing function of b: The second-order critical value is also increasing in q; the number of restrictions being jointly tested. This result is especially interesting when q is large. In this case, the size distortion of the usual Wald test is large. For example, Burnside and Eichenbaum (1996) show that the small sample size of the Wald test increases sharply with the number of hypotheses being jointly tested. The second-order critical value takes this into account and adjusts the critical value monotonically with q. Our result provides a theoretical explanation of the …nite sample results reported by Ray and Savin (2008) and Ray, Savin and Tiwari (2009) who …nd that the …xed-b type of asymptotic approximation can substantially reduce size distortion in testing joint hypotheses, especially when the number of hypotheses being tested is large.
The correction in (15) can be regarded as a Bartlett-type correction. Bartlett (1937Bartlett ( , 1954 considers likelihood ratio tests, but the basic idea can be applied to Wald tests as well. See Cribari-Neto and Cordeiro (1999) for a more recent survey. The argument goes as follows. Suppose that X s F 1 (q; b) and EX = 1 + bC + o(b) for some constant C; then as b ! 0; F 1 (q; b)=(1 + bC) is closer to the 2 q =q distribution than the original distribution So the constant C should be C = 2q: We have thus provided another way to motivate the correction in (13). Essentially, the correction makes the mean of the F 1 (q; b) distribution closer to that of the 2 q =q distribution. In addition to the Bartlett-type correction, Theorem 3 approximates the nonstandard distribution by an F -distribution rather than a chi-square distribution.
De…ne F T = F T : It then follows from Theorems 2 and 3 that under the sequential asymptotics, we have as b ! 0: In the rest of the paper, we call the test based on the corrected Wald statistic F T and using the critical values from the F q;K distribution the F test. To emphasize its reliance on the vector autoregression, we also refer to it as the VAR F test.

VAR Lag Order Determination
For VAR models, it is standard practice to use model selection criteria such as AIC or BIC to choose the lag order. However, the AIC and BIC are not aimed at the testing problem we consider. In this section, we propose a new lag order selection rule that is based on the bandwidth choice for the rectangular kernel LRV estimator. We set the VAR lag order equal to the bandwidth for the rectangular kernel LRV estimator.
The question is how to select the bandwidth for the rectangular kernel LRV estimator that is directed at the testing problem at hand. Before addressing this question, we review the method proposed by Sun (2010d) who considers …nite-order kernel LRV estimators and associated F tests. He proposes to select the bandwidth to minimize an asymptotic measure of the type II error while controlling for the asymptotic type I error. More speci…cally, the testing-optimal bandwidth is given by where e I (b) and e II (b) are approximate measures of type I and type II errors and > 1 is the so-called tolerance parameter. Under some regularity conditions, for a kernel function k(x) with Parzen exponent %; the type I error of the kernel F test is shown to approximately equal where is the nominal type I error, X q is the -level critical value from G q ( ) ; the CDF of the 2 q distribution, and (bT ) % B is the asymptotic bias of the kernel LRV estimator. The average type II error under the local alternative H 1 where G`; 2 o ( ) is the CDF of the noncentral 2 2 o distribution and c 2 = R 1 1 k 2 (x) dx: In the above expression, higher-order terms and a term of order 1= p T that does not depend on b have been omitted.
The testing optimal bandwidth in Sun (2010d) depends on the sign of B: When B < 0; the constraint e I (b) is binding and the optimal b satis…es e I (b ) = . When B > 0; the constraint e I (b) is not binding and the optimal b minimizes e II (b): In principle, we can follow Sun (2010d) and select the testing-optimal bandwidth for the rectangular kernel. However, his results do not directly apply as the kernels considered in Sun (2010d) are …nite-order kernels while the rectangular kernel is an in…nite-order kernel. A similar problem is also present for optimal bandwidth choice under the MSE criterion, as the conventional squared bias and variance tradeo¤ does not apply to the rectangular kernel. To solve this problem, we employ a second-order kernel as the target kernel and use its testing-optimal bandwidth as a basis for bandwidth selection for the rectangular kernel.
Let k tar ( ) be the target kernel and b tar be the associated testing-optimal bandwidth parameter. For example, we may let k tar ( ) be the Parzen kernel, the QS kernel, or any other commonly used …nite-order kernel. We set the bandwidth for the rectangular kernel to be where c 2;tar = R 1 1 k 2 tar (x) dx and c 2;rect = R 1 1 k 2 rect (x) dx = 2: For example, when the Parzen kernel is used as the target kernel, When the QS kernel is used as the target kernel, Given b rect ; we set the VAR lag order to be p = db rect T e: For convenience, we refer to this bandwidth selection and lag order determination method as the method of target kernel (MTK). When B < 0; the bandwidth based on the MTK is the same as the testing-optimal bandwidth for the target kernel. In this case, all F tests are expected to be over sized, thanks to the asymptotic bias of the associated LRV estimator. For a given bandwidth parameter and under some regularity conditions, the asymptotic bias of the rectangular kernel LRV estimator is of smaller order than that of any …nite-order kernel (see Politis, 2010). As a consequence, the bandwidth selected by the MTK is expected to control the type I error at least as well as the testing-optimal bandwidth selection rule for the target kernel.
When B > 0; the type I error of the F test is expected to be capped by the nominal type I error. This gives us the opportunity to select the bandwidth to minimize the type II error without worrying about over rejection. With the bandwidth selected by the MTK, the third term of the form 2 o G 0 (q+2); 2 o X q X q c 2 b=2 in e II (b) is the same for the rectangular kernel and the target kernel, while the second term is expected to be smaller for the rectangular kernel. Therefore, the F test based on the rectangular kernel and the MTK is expected to have smaller type II error than the F test based on the target kernel with testing-optimal bandwidth choice.
To sum up, when the F tests are expected to over-reject, the rectangular kernel with bandwidth selected by the MTK delivers an F test with a smaller type I error than the corresponding target kernel. On the other hand, when the F tests are expected to underreject so that the asymptotic type I error is capped by the nominal type I error, the F test based on the rectangular kernel and the MTK is expected to have smaller type II error than the F test based on the …nite-order target kernel.
Our bandwidth selection rule via the MTK bears some resemblance to a rule suggested by Andrews (1991, footnote on page 834). Andrews (1991) employs the MSE criterion and suggests setting the bandwidth for the rectangular kernel equal to the half of the MSEoptimal bandwidth for the QS kernel. Essentially, Andrews (1991) uses the QS kernel as the target kernel. This is a natural choice as the QS kernel is the optimal kernel in the class of positive semide…nite kernels. Lin and Sakata (2009) make the same recommendation and show that the resulting rectangular kernel LRV estimator has smaller AMSE than the QS kernel LRV estimator. When B > 0; the MTK is analogous to that suggested by Andrews (1991) and Lin and Sakata (2009). However, when B < 0 such that the F tests tend to over-reject, the MTK is di¤erent. It suggests using the same bandwidth, rather than a fraction of it, as the bandwidth for the target kernel in order to control the size distortion.

Simulation Study
This section provides some simulation evidence on the …nite sample performance of the VAR F test. We compare the VAR F test with the chi-square tests as well as kernel-based F tests recently proposed by Sun (2010d).

Location model
In our …rst simulation experiment, we consider a multivariate location model of the form where y t = (y 1t ; y 2t ; y 3t ) 0 ; u t = (u 1t ; u 2t ; u 3t ) 0 and = ( 1 ; 2 ; 3 ) 0 : The error processes fu it g are independent of each other. We consider two cases. In the …rst case, all components of u it follow the same AR(2) process: where e it s iidN (0; 2 e ) and 2 e = (1 In the second case, all components of u it follow the same MA(2) process: where e it s iidN (0; 2 e ) and 2 e = 1 + 2 1 + 2 2 1 . In both cases, the value of 2 e is chosen such that the variance of u it is one.
We consider the following null hypotheses: H 0q : 1 = : : : = q = 0 for q = 1; 2; 3. The corresponding restriction matrix is R 0q = I d (1 : q; :); i.e., the …rst q rows of the identity matrix I 3 : The local alternative hypothesis is H 1q , is the long run variance matrix of u t ;c is uniformly distributed over the sphere S q 2 ; that is,c = = k k ; s N (0; I q ): It is important to point out that 2 is not the same as 2 o used in the testing-oriented criterion and the MTK. We consider the following ( 1 ; 2 ) combinations: The last two combinations come from den Haan and Levin (1998). The combination with negative 2 comes from Kiefer and Vogelsang (2002a,b). The remaining combinations consist of simple AR(1) or MA(1) models with di¤erent persistence.
We consider two sets of testing procedures. The …rst set consists of the tests using the VAR variance estimator. For each restriction matrix R 0q ; we …t a VAR(p) model to R 0q (u t u t ) by OLS. We select the lag order of each VAR model by AIC or BIC. As standard model selection methods, the details on AIC and BIC can be found in many textbooks and papers, see for example, Lütkepohl (2007, sec 4.3) and den Haan and Levin (1998). We also consider selecting the VAR order by the MTK, that is p = db rect T e where b rect is de…ned in (22). We use Parzen and QS kernels as the target kernels. We call the resulting two VAR order selection rules the VAR-Par rule and VAR-QS rule.
For the each of the VAR order determination methods, we construct the VAR variance estimator and compute the Wald statistic. We perform both the F test proposed in this paper and the traditional 2 test. The F test employs the modi…ed Wald statistic exp ( 2qb) F T and critical values from the F q;K distribution whereK = max(dT =(2p)e; 5) andp is the selected VAR order. The traditional 2 test employs the unmodi…ed Wald statistic and critical values from the 2 q =q distribution. We refer to these tests as F -VAR-AIC, 2 -VAR-AIC, F -VAR-BIC, 2 -VAR-BIC, F -VAR-Par, 2 -VAR-Par, F -VAR-QS, and 2 -VAR-QS, respectively.
The second set of testing procedures consists of kernel-based Wald-type tests. We consider two commonly used second-order kernels: the Parzen and QS kernels. For each kernel, the bandwidth is determined via either a modi…ed MSE criterion (Andrews, 1991) or the testing-oriented criterion (Sun, 2010d). In the former case, we consider the asymptotic MSE of the LRV estimator for the transformed moment process h t . This is in contrast with the original MSE criterion in Andrews (1991), which is based on the moment process f t : To some extent, the modi…ed MSE is tailored to the null hypothesis. The modi…cation makes the MSE-based method as competitive as possible. In the latter case, the bandwidth is selected to solve the constrained minimization problem in 21. We set = 1:2 in the simulation experiment. The conventional tests using MSE-optimal bandwidth and the 2 q =q critical values are referred to as 2 -Parzen and 2 -QS, respectively. The tests proposed by Sun (2010d) are referred to as F -Parzen and F -QS as they use critical values from F distributions. Both the MSE-optimal bandwidth and the testing-optimal bandwidth require a plug-in implementation. We use the VAR model selected by the BIC as the approximating parametric model.
To explore the …nite sample size of the tests, we generate data under the null hypothesis. To compare the power of the tests, we generate data under the local alternative. For each test, we consider two signi…cance levels = 5% and = 10%; three di¤erent sample sizes T = 100; 200; 500: The number of simulation replications is 10000. Table 1 gives the type I errors of the ten testing methods for the AR error with sample size T = 100. The signi…cance level is 5%, which is also the nominal type I error. Several patterns emerge. First, as it is clear from the table, the conventional chi-square tests can have a large size distortion. The size distortion increases with both the error dependence and the number of restrictions being jointly tested. The size distortion can be very severe. For example, when ( 1 ; 2 ) = (:8; 0) and q = 3, the empirical type I errors of the conventional Wald tests are 0.475 and 0.452 respectively for the Parzen and QS kernels. These empirical type I errors are far from 0.05, the nominal type I error.
Second, the size distortion of the VAR F test is substantially smaller than the corresponding 2 test. Note that the lag order underlying the VAR F test is the same as that for the corresponding VAR 2 test. The VAR F test is more accurate in size because it employs an asymptotic approximation that captures the estimation uncertainty of the LRV estimator. Based on this observation, we can conclude that the proposed …nite sample correction, coupled with the use of the F critical values, is very e¤ective in reducing the size distortion of the 2 test.
Third, the size distortion of the F -Parzen and F -QS tests is also much smaller than that of the corresponding 2 tests. There are two reasons for this observation. For the kernel F tests, the bandwidth is chosen to control the asymptotic type I error, which captures the empirical type I error to some extent. In addition, the kernel F tests also employ more accurate asymptotic approximations. So it is not surprising that the kernel F tests have more accurate size than the corresponding 2 tests.
Fourth, among the F tests based on the VAR variance estimator, the test based on the MTK has the smallest size distortion while the test based on the BIC has the largest size distortion. Unreported results show that in an average sense the VAR order selected by the MTK is the largest while that selected by the BIC is the smallest. In terms of the size accuracy, the AIC and BIC appear to be too conservative in choosing the AR lag order. This is especially true for the BIC. den Haan and Levin (1998) obtain the same result. Hall (1994) and Ng and Perron (1995) obtain similar results in Monte Carlo studies on the choice of AR order for the augmented Dickey-Fuller unit root test.
Finally, when the error process is highly persistent, the VAR F test with the VAR order selected by the MTK is more accurate in size than the corresponding kernel F test. This observation con…rms the advantage of using the VAR variance estimator as compared to the kernel LRV estimator using a …nite-order kernel. On the other hand, when the error process is not persistent, all the F tests have more or less the same size properties. So the VAR F with the VAR order selected by the MTK reduces the size distortion when it is needed most, and maintains the good size property when it is not needed. Figures 4-6 present the …nite sample power in the AR case for di¤erent values of q. We compute the power using the 5% empirical …nite sample critical values obtained from the null distribution. So the …nite sample power is size-adjusted and power comparisons are meaningful. It should be pointed out that the size adjustment is not feasible in practice. The parameter con…guration is the same as those for Table 1 except that the DGP is generated under the local alternatives. The power curves are for the F tests. We do not include chi-square tests as Sun (2010d) has shown that the kernel-based F tests are as powerful as the conventional chi-square tests. Three observations can be drawn from these …gures. First, the VAR F test based on the AIC or BIC are more powerful than the other F tests. Among all F tests, the VAR F test based on the BIC is most powerful. However, this F test also has the largest size distortion. Second, the power di¤erences among the F tests are small in general. An exception is the F -QS test, which incurs some power loss when the processes are highly persistent and the number of restrictions being jointly tested is relatively large. Third, compared with the kernel F test with testing optimal bandwidth, the VAR F test based on the MTK has very competitive powersometimes it is more powerful than the kernel F test. Therefore, the VAR F test based on the MTK achieves more accurate size without sacri…cing power. This is especially true for the F -VAR-QS test. Table 2 presents the simulated type I errors for the MA error. The qualitative observations on size comparison for the AR case remain valid. In fact, these qualitative observations hold for other parameter con…gurations such as di¤erent sample sizes and sig-ni…cance levels. Quantitatively, the empirical type I errors for the MA case are smaller than those for the AR case. We do not present the power …gures for the MA case but note that the qualitative observations on power comparison for the AR case still hold.

Regression model
In our second simulation experiment, we consider a regression model of the form: where x t is a 3 1 vector process and x t and " t follow either an AR (1) process or an MA(1) process x t;j = e t 1;j + p 1 2 e t;j ; " t = e t 1;0 + p 1 2 e t;0 : The error term e t;j s iidN (0; 1) across t and j. For this DGP, we have m = d = 4: Throughout we are concerned with testing for the regression parameter and set = 0 without the loss of generality. Let = ( 0 ; 0 ) 0 . We estimate by the OLS estimator. Since the model is exactly iden-ti…ed, the weighting matrix W T becomes irrelevant. Letx 0 t = [1; x 0 t ] andX = [x 1 ; : : : ;x T ] 0 ; then the OLS estimator is^ T where is the LRV matrix of the processx t " t : We consider the following null hypotheses: H 0q : 1 = : : : = q = 0 for q = 1; 2; 3: The local alternative hypothesis is H 1q 1=2c andc is uniformly distributed over the sphere S q 2 : Table 3 reports the empirical type I error of di¤erent tests for the AR(1) case. As before, it is clear that the F test is more accurate in size than the corresponding 2 test. Among the three VAR F tests, the test based on the MTK has less size distortion than that based on AIC and BIC. This is especially true when the error is highly persistent. Compared with the kernel F test, the VAR F test based on the MTK is more accurate in size.
To sum up, the F -VAR-QS test has much smaller size distortion than the conventional 2 test, as considered by den Haan and Levin (1998). It also has more accurate size than the kernel F tests proposed by Sun (2010d). The size accuracy of the F -VAR-QS test is achieved with no or small power loss. In fact, the F -VAR-QS test is more powerful than the corresponding kernel test in some scenarios.

Conclusions
The paper has established a new asymptotic theory for long run variance estimators that are based on …tting a vector autoregressive model to the estimated moment process. The new asymptotic theory assumes that the VAR order is proportional to the sample size. Compared with the conventional asymptotics, the new asymptotic theory has two attractive properties: the limiting distribution re ‡ects the VAR order used and the estimation uncertainty of moment conditions. On the basis of this new asymptotic theory, we propose a new and easy-to-use F test. The test statistic is equal to a …nite sample corrected Wald statistic and the critical values are from the standard F distribution. Simulations show that the F test has much smaller size distortion than the conventional chi-square test.
The new asymptotic theory can be extended to the autoregressive estimator of spectral densities at other frequencies. The idea of the paper can be used to tackle other econometric problems that employ vector autoregression as an approximating parametric model to capture autoregressive or moving average components, for example, in the augmented Dickey-Fuller unit root test of Said and Dickey (1984) and the fully modi…ed regression of Phillips and Hansen (1990).
In this paper, the VAR order is determined by the AIC, BIC or the MTK that exploits the connection between autoregressive estimators and kernel estimators. Although the MTK works very well, it is interesting to develop alternative model selection criteria for LRV estimation and the associated autocorrelation robust tests.        Let be an eigenvalue ofÂ 0 and x = (x 0 1 ; : : : ; x 0 p ) 0 be the corresponding eigenvector. Then : : : From these equations, we know that x 6 = 0 implies x 1 6 = 0: Writing these equations more compactly, we have We consider the case 6 = 0: In this case,B 0 x 1 6 = 0: It follows from (25) and the Toeplitz structure of^ H (p + 1) that where the last line follows becausê So, we get As a result, k k 2 < 1 if^ H (p) and^ H (p + 1) are positive de…nite.
Proof of Theorem 2. Sincê for a q q matrix D such that as desired.
Proof of Theorem 3. Let H be an orthonormal matrix such that H = ( = k k ; ) 0 where is a q (q 1) matrix, then where e 1 = (1; 0; 0; : : : ; 0; 0) 0 . Note that k k 2 is independent of H and that HW q (r) has the same distribution as W q (r); so we can write where G q is the CDF of the random variable 2 q =q: We therefore have shown that the …rst and second moments of 11 2 depend only on P 1 n=1 n and P 1 n=1 ( n ) 2 up to smaller order o(b): Furthermore, we can show that higherorder moments of 11 2 are of order o(b): But P 1 n=1 n and P 1 n=1 ( n ) 2 are the same for 11 2 and 11 2 : This result, combined with a Taylor expansion, yields P (F 1 (q; b) < z) = P (F q;K q+1 z) + o(b):