Spectral Density and Spectral Distribution Inference for Long Memory Time Series via Fixed-b Asymptotics

This paper studies taper-based estimates of the spectral density utilizing a fixed bandwidth ratio asymptotic framework, and makes several theoretical contributions: (i) we treat multiple frequencies jointly, (ii) we allow for long-range dependence or anti-persistence at differing frequencies, (iii) we allow for tapers that are only piecewise smooth or discontinuous, including flat-top and truncation tapers, (iv) we study higher-order accuracy through the limit distribution’s Laplace Transform, (v) we develop a taper-based estimation theory for the spectral distribution, and show how confidence bands can be constructed. Simulation results produce quantiles and document the finite-sample size properties of the estimators, and a few empirical applications demonstrate the utility of the new methods.


Introduction
Suppose that we have a sample Y 1 , Y 2 , · · · , Y N from a weakly stationary time series {Y t }, and consider a kernel-based estimator of the spectral density f (θ) defined via for any fixed θ ∈ [−π, π]. Here Λ is the kernel, or taper, and is a bounded even function of domain [−1, 1]. The sequence γ h consists of sample autocovariances, where the centering can be taken as either zero, the sample mean, or OLS estimates of a more complicated regression effect. The bandwidth M is taken to grow at the same rate as the sample size N , rather than the usual o(N ) growth rate, such that M = bN for some b ∈ (0, 1); we say that the bandwidth-ratio b is fixed, and use the terminology of fixed-b asymptotics. The following result is a consequence of Theorem 1 of Hashimzade and Vogelsang (2008) under assumptions consistent with a short memory time series: as N → ∞. The limiting random variable S θ (b) is a quadratic functional of Brownian Motion that depends on the bandwidth proportion b, but not on the short memory autocorrelation function of the data process, and thus can be simulated without any knowledge of nuisance parameters. The limit also depends on the taper Λ, and the distribution depends on θ as well, since results differ depending on whether θ = 0, θ = π, or θ ∈ (0, π). Furthermore, the distribution at frequency θ = 0 also depends on the type of centering used to define γ h .
As noted in Hashimzade and Vogelsang (2008) -henceforth HV -the asymptotic coverage provided by the so-called large-bandwidth approach is superior when b is greater than zero, and also has the advantage of guaranteeing a positive random limit (when the taper Λ is positive definite).
The potential application of a better inferential methodology for the spectral density function is quite large, as demonstrated by the ubiquity of spectral methods in the physical sciences as well as econometrics; see Grenander and Rosenblatt (1953), Parzen (1957), Blackman and Tukey (1959), Bohman (1960), and the discussion in Priestley (1981). Understanding the joint distribution of spectral estimates across multiple frequencies is useful for the identification of hidden periodicities in the time series. One application is the identification of residual seasonality in seasonally adjusted economic time series via examination of spectral estimates in the program X-12-ARIMA, as discussed in Findley, Monsell, Bell, Otto, and Chen (1998). Literally millions of time series are seasonally adjusted each month by the program X-12-ARIMA at statistical agencies around the world -with vast ramifications for public policy -and spectral peak estimation and assessment is featured as a diagnostic tool in every application.
The paper at hand seeks to make several extensions of the fundamental results of HV. Firstly, we extend their basic results to a joint theorem over a finite collection of frequencies. This is important for assessing the uncertainty in taper-smoothed estimates of the spectral density, where we may be interested in 30 to 60 ordinates at a time. As our results below demonstrate, S θ 1 (b 1 ) is asymptotically independent of S θ 2 (b 2 ) for θ 1 = θ 2 and any b 1 , b 2 ∈ (0, 1]. This technical result will allow us to construct simultaneous confidence intervals, allowing one to assess uncertainty in a nonparametric spectral analysis. Secondly, we study cyclical long-range dependence, where each frequency of the spectral den-sity may correspond to a long memory pole or a negative memory zero; see Boutahar (2008) for related asymptotic results for the case of a single frequency. Cyclical long memory is useful for capturing highly persistent seasonal or cyclical phenomena that evolve too rapidly to be considered nonstationary; see Holan and McElroy (2012) for examples and applications of the concept to the problem of seasonal adjustment. The presence of cyclical long memory implies that the rate of convergence of the spectral estimates depends on the corresponding memory parameter, and the limit distributions become quadratic functionals of Fractional Brownian Motion -this is an extension of the frequency zero results of McElroy and Politis (2012). The rate of growth of the spectral estimates is non-standard in this case, so that the resulting confidence intervals are much wider (for long memory) or shorter (for negative memory) than in the regular short memory scenario.
Thirdly, we extend the limit theorems to piecewise smooth tapers, such as flat-top tapers (see Politis and Romano (1995) and Politis (2001)), and also to tapers with jump discontinuities, Fourthly, we provide a discussion of higher-order accuracy of the limit theory arising from the fixed-bandwidth ratio methodology. In the recent literature on Heteroskedasticity-Autocorrelation Consistent (HAC) testing -see Kiefer, Vogelsang, and Bunzel (2000) and Kiefer and Vogelsang (2002) -this has meant an expansion of the fixed-bandwidth ratio limit distributions as b tends to zero, such that the first term in the expansion is the conventional limit distribution of the vanishing-bandwidth ratio theory (i.e., in the HAC case a standard normal). We're not aware that a higher-order accuracy limit theory has been published for fixed bandwidth ratio spectral density estimates, though Velasco and Robinson (2001) study the vanishing bandwidth ratio case.
Actually, the HAC literature shows that S 0 (b) tends to a point mass at unity as b tends to zero; correspondingly, the higher-order accuracy results in this paper demonstrate that the cumulative distribution function of S θ (b) can likewise be expanded as b → 0, with a leading term equal to an indicator function, followed by other expressions involving cumulants. To achieve this, we introduce a novel method of inverting the Laplace Transform of Gaussian quadratic forms.
It may be of some interest to provide a confidence band for the entire spectral density. This is not possible if long-range dependence is present, because each frequency would potentially be growing at different rates. Also, because the spectral density limit distributions across frequencies are independent in a fixed bandwidth ratio approach, the global behavior is better summarized through the spectral distribution function (Woodroofe and Van Ness (1967) consider the spectral density bands under a vanishing bandwidth fraction asymptotic approach). Although previous literature explores the estimation of the spectral distribution function (again, see Grenander and Rosenblatt (1953) and Parzen (1957), as well as Dahlhaus (1985)), here we provide a fixed-bandwidth ratio treatment. We discuss the estimation of the limit distribution, and how this can be utilized to construct spectral confidence bands.
The limit distributions S θ (b) do not differ tremendously from the frequency zero case, but there are a few alterations from the previous distribution theory (aside from the impact of kinks in the taper) given in McElroy and Politis (2012). For all frequencies except 0 and π, the estimates converge to the sum of two independent copies of the limit in the HAC case (frequency zero); in the case of a short memory process, this result can also be found in HV, but our results also cover long memory and negative memory processes. Moreover, we focus our treatment on spectral estimates that are centered by the sample mean (so we do not consider more complicated mean regression functions), which only affects the asymptotic distribution at frequency zero. Without the centering, the limit random variable at frequency zero is a quadratic functional of Fraction Brownian Motion and that the effect is more pronounced with small b. We repeat some of this material for the spectral case, discussing the critical values as a function of b for various memory parameters.
When memory is absent from all frequencies of interest, we can construct confidence intervals using the short memory critical values, but otherwise some estimate of the memory parameter must be supplied to the quantile function. In our applications we propose a simplistic nonparametric estimate of the memory parameter, as a function of frequency, and utilize a plug-in approach to inference. Our simulation studies illustrate how size is contingent on taper, bandwidth, and sample size, presuming that the memory parameter is known.
In practice one must select a bandwidth fraction b, and its choice has a substantial impact on the resulting appearance of spectral density estimates. Is there an optimal choice of b? In McElroy and Politis (2011) the idea was presented to select b that produces the smallest confidence interval possible, and that philosophy here will lead to b approximately zero in the case of short memory.
However, this will produce a very smooth estimate of the spectral density, and it may be desirable to have a degree of resolution over the frequencies. Another approach is to use a full bandwidth with b = 1, which leads to wider confidence intervals. We also present numerical results on the choice of b that yields the smallest confidence interval possible, as a function of memory parameter and taper. In our opinion the choice of b ultimately depends upon the practitioner's particular goals of spectral analysis. For example, if the analyst is interested in spectral peak detection, then the degree of smoothing implied by the choice of b corresponds to the broadness of the peak -large values of b will allow for visualization of narrow peaks, some of which may be spurious, whereas smaller values of b will smooth out the spectrum, allowing visualization of broader peaks. These points and the general methodology are demonstrated on one construction and one retail series, using the re-coloring approach (Grether and Nerlove, 1970) to handle evident non-stationarity.
The paper is organized as follows. In Section 2 we provide a discussion of cyclical long memory, which sets the general framework for most of the paper. Then Section 3 provides the asymptotic theory for fixed-bandwidth fraction estimation of the spectral density and the spectral distribution function. In Section 4 is a treatment of higher-order accuracy, with an application of the method of Laplace inversion. Section 5 contains a description of our methods of simulation for critical values, the performance on finite samples from simulation, and a description of the bandwidth selection procedure. The full methodology is demonstrated on two economic time series in Section 6, and Section 7 concludes. All proofs are in the Appendix.

Cyclical Long Memory and Data Assumptions
From now on, let {Y t } be a constant mean stationary time series with finite variance, such that {γ h } is the autocovariance function (acf). We define cyclical long memory in analogy with conventional long memory, such that the definition agrees with the implicit definition in seasonal fractionally intergrated processes (Gray, Zhang, and Woodward (1989)) and Gegenbauer processes (Woodward, Cheng, and Gray (1998)). When the acf is absolutely summable, the spectral density f (θ) = h γ h cos(θh) is well-defined, but here we consider the case where the spectral density has long memory poles. On the other hand, if the spectral density has a zero, this corresponds to cyclical negative memory (McElroy and Politis, 2011). We say that the time series has cyclical memory at where L θ is a slowly-varying function at infinity (let L denote the set of such functions), with a limit of C θ ∈ [0, ∞]. Also the memory parameter is β θ , a number in (−1, 1). The case that β 0 = −1 was explored in McElroy and Politis (2011), and it produces somewhat non-standard asymptotic results for the sample mean; we ignore this case in this paper.
Definition 1 A weakly stationary time series with spectral density f has cyclical memory at frequency θ ∈ [0, π] if (2) holds. This property is denoted by CM(β, θ).
Note that CM(0, θ) denotes short memory at frequency θ, i.e., 0 < f (θ) < ∞. More generally, the definition of cyclical memory indicates that f (θ) equals 0, ∞, or C θ depending on whether β θ is negative, positive, or equal to zero, and these cases correspond to negative cyclical memory, long cyclical memory, and short cyclical memory respectively (for short cyclical memory, we also impose that C θ is a nonzero finite constant). This is a time domain formulation of the basic concept. The following proposition relates it to a frequency domain formulation, which some readers may find more intuitive. When a zero or pole occurs at a nonzero frequency, it must be present at the negative of that frequency as well, because the spectral density is an even function on [−π, π]. When the zero or pole occurs at frequency zero, the spectral density might be written as f (λ) = |λ| α g(λ) L(|λ| −1 ) for α ∈ (−1, 1), g a positive, even, and bounded function, and L ∈ L. But if the zero/pole occurs at a nonzero frequency θ, we can generally write the spectral density as This form only treats one zero/pole frequency θ, but the following result can be easily generalized to spectra with multiple distinct zeroes and/or poles.
So the processes discussed in Proposition 1 have zeroes/poles of diverse orders at differing frequencies, and this in turn is connected to rates of convergence of the partial sums of autocovariances weighted by cosines. Consider the following class of spectral densities, where there are J zeroes/poles at nonzero frequencies θ j (not including the conjugate zeroes/poles −θ j ) of order α j , and accompanying slowly varying functions L j . A process with such a spectral density belongs to the class J j=1 CM(−α j , θ j ), noting that CM(−α, θ) = CM(−α, −θ). In order to formulate the asymptotic results of this paper, we must make some additional assumptions about the observed stochastic process. We will consider the same set of assumptions discussed in McElroy and Politis (2011), namely that the data process is either linear, or can be written as a function of a Gaussian process, or satisfies certain higher order cumulant conditions.
The kth order cumulant of {Y t } is defined by for any t and integers u 1 , · · · , u k−1 , where k ≥ 1 (cf. Taniguchi and Kakizawa (2000)). Letting u denote the k − 1 vector of indices, we will write c k (u) for short. Also let · denote the sup-norm of a vector, so that u <n c k (u) is a short-hand for summing the cumulant over all indices such that |u j | < n for each j. We also require the concept of Hermite rank (Taqqu, 1975): if g ∈ L 2 (R, e −x 2 /2 ), then it can be expanded in terms of the Hermite polynomials H k , with coefficients g, H k (the bracket denotes the inner product of the Hilbert Space) for k ≥ 0. The Hermite rank is the index of the first nonzero coefficient.
In addition to supposing that the process is CM(β θ j ,θ j ) for a collection of frequencies θ j ∈ [0, π], j = 1, · · · , J, we also consider the following assumptions: iid with finite variance.
• Process P2. Y t = g(X t ) for each t, where g is a function in L 2 (R, e −x 2 /2 ) of Hermite rank τ , and {X t } is a Gaussian process with autocovariance function r k . If β θ j > 0, also assume that • Process P3. {Y t } is a strictly stationary process whose kth order cumulants exist and are summable over its k indices, for all k ≥ 1. Moreover, when β θ j < 0 we also assume that See the discussion in McElroy and Politis (2011) for why a moment-plus-mixing condition is not viable. Each of the assumptions P1, P2, or P3 is sufficient to establish a limit theorem for the Discrete Fourier Transforms of the data, as shown below. These process assumptions are typically unverifiable from the observed data, and should be viewed as working assumptions.

Asymptotic Theory for Spectrum Estimation
The theory developed here is similar to that of HV, but is extended to processes with cyclical memory, similarly to how McElroy and Politis (2012) extended the HAC theory to long-range dependent processes. First we establish a joint convergence theorem for normalized Discrete Fourier Transforms (DFTs), which is a result of independent interest. Secondly, we apply this result to the analysis of taper-smoothed estimates of the spectral density. Thirdly, we address the estimation of the spectral distribution function in the case of a bounded positive spectral density.

Theory for DFTs
Let {Y t } be a mean µ stationary time series with acf {γ h }, as described in Section 2. We suppose that a sample of size N is available: Y 1 , Y 2 , · · · , Y N , and the sample autocovariances are computed via for h = 0, 1, 2, · · · , and Y = n −1 n t=1 Y t . Results can be modified easily if we do not demean and assume µ = 0 (as discussed in HV as well), but our main exposition assumes centering of estimates by the sample mean for simplicity of presentation. The DFT of the sample is N t=1 (Y t − Y )e −iθt , which has real and imaginary parts given by cosine and sine summations, respectively. These trigonometric partial sums are the key aspect in the asymptotic analysis of the spectral density estimates of this paper. We introduce the weighted-sum notation as follows: for a sequence {g t }. Then the DFT equals S N (c(θ))+iS N (s(θ)) for c(θ) = cos(θ·) and s(θ) = sin(θ·).
The rate of growth of S N (c(θ)) and S N (s(θ)) will depend upon θ, because if there is a zero or pole at frequency θ the growth rate is affected by long-range dependence. Ultimately, we wish to prove joint functional limit theorems for the processes r → {S [rN ] (c(θ)), S [rN ] (s(θ))}, jointly over a finite collection of frequencies θ. Here the square bracket refers to the greatest integer function.
The key quantities that determine the growth rates of the real and imaginary parts of the DFT are the respective variances: Then with W N (θ) = |h|≤N γ h cos(θh), we have the following identity: This follows by recognizing that and that the latter expression in (5) can be re-expressed, using summation by parts, into (4).
Noting that the definition of W N (θ) together with the CM(β θ ,θ) assumption yields an asymptotic growth rate of L θ (N ) N β θ , we can apply (4) In the case of short memory, where β θ = 0 and L θ tends to a nonzero constant C θ , (6) becomes V N (θ) ∼ N C(θ) and C(θ) equals one half the spectral density. In all cases of cyclical memory, the square root of V N (θ) will be the appropriate normalizing rate for the DFT sums, as shown below.
As discussed in McElroy and Politis (2011), it is more convenient for us to formulate the results in the space C[0, 1] of continuous functions, rather than the Skorohod space. Therefore we will consider a linearly-interpolated version ξ [·N ] . This affects the mean-centering slightly, though the asymptotic impact is negligible.

respectively.
Theorem 1 Let {Y t } be covariance stationary with mean µ and acf {γ h }, such that the process is CM(β θ j ,θ j ) for a collection of frequencies θ j ∈ [0, π], j = 1, · · · , J. Letting κ = max 1≤j≤J 2∧[2/(1+ β θ j )], suppose that E[|Y t | κ+δ ] < ∞ for some δ > 0, and also assume that or P3 holds, and that in the case of a P2 process with at least one β θ j > 0, the Hermite rank is unity. Then the following weak convergence holds in the space C([0, 1], R 2J ): Hence the centering for the sine partial sum is asymptotically irrelevant, as is the centering for the cosine partial sum unless θ = 0.
Theorem 1 provides the assumed conditions (4), (5), (6), and (7) of HV, and also provides a generalization of the short memory situation. We next discuss its application to spectral density estimation.

Asymptotic Theory for Spectral Density Estimation
Now in order to apply (7) to spectral estimation, it is necessary to extend the FBMs discussed above to Fractional Brownian Bridges (FBBs) as in HV, defined as follows: Here x is a deterministic vector process with each component x j ∈ C[0, 1], and corresponds to regression effects in the data process; see Phillips (1998) for a more detailed exposition. That is, when the mean of the process {Y t } is non-constant, and perhaps is parametrized by regression functions such that the demeaned {Y t } is mean zero and stationary, then our partial sums and DFT statistics should be constructed from variables Y t centered by estimates of these mean effects.
In this paper, we focus on the simple case that x(t) ≡ 1, corresponding to centering by the sample mean (the ordinary least squares estimate of a constant mean); see the Appendix for a partial elaboration of the more general case. Note that this centering has no impact except at frequency zero, which follows from Remark 1 above, which shows that only the real part of the DFT (i.e., the cosine partial sum) at frequency zero needs to be mean-centered. In the case that the true mean is zero and this assumption is utilized in our statistics, then x(t) ≡ 0 and B ±,θ = B ±,θ , the FBM.
We now suppose that an estimate of the spectrum is computed via (1) using autocovariance estimates centered by the sample mean (or without centering in the special case that the mean is known to be zero), as described above. The taper (or kernel) Λ comes from a wide family that encompasses flat-top tapers (Politis, 2001), the Bartlett taper, as well as other tapers considered in Kiefer and Vogelsang (2005) and HV: {Λ is even with support on [-1,1] such that Λ(x) is constant for |x| ≤ c, for some c ∈ [0, 1); also, Λ is twice continuously differentiable on (c, 1).} A derivative of Λ from the left (with respect to x) is denotedΛ − , whereas from the right isΛ + ; the second derivative isΛ. Note that we allow for Λ to have a jump discontinuity at c; for example, our results apply to the truncation taper given by the indicator on the interval [−c, c]. Our main result, which is stated next, follows from Theorem 1 and an analysis of the spectral estimator, expanding on the analysis of HV.
Theorem 2 Let {Y t } be covariance stationary with mean µ and acf {γ h }, such that the process is CM(β θ j ,θ j ) for a collection of frequencies θ j ∈ [0, π], j = 1, · · · , J. Letting κ = max 1≤j≤J 2∧[2/(1+ or P3 holds, and that in the case of a P2 process with at least one β θ j > 0, the Hermite rank is unity. Also suppose that either the sample autocovariances are centered by the sample mean, or the they are not centered and that µ = 0. For tapers defined via (8), as N → ∞ we have jointly in θ j for j = 1, 2, · · · , J. In the case that there is a jump discontinuity in Λ at c, we must replace the third summand in the limit distribution by This result describes the limit behavior of the spectral density estimate in the case that cyclical memory is present, considering a finite collection of frequencies. If these frequencies happen to correspond to short memory dynamics, then the spectral density is finite and nonzero. Letting so that the convergence of Theorem 2 in the case of short memory may be summarized as where we denote the limit random variable on the right hand side of the convergence in Theorem 2 via S θ (b). A numerical description of this distribution is given in HV. A technical description can be given through the moment generating function, or Laplace Transform (LT) of S θ (b), as in McElroy and Politis (2009); this is developed in Section 4 below. Tables of quantiles can be given over a grid of b values, depending on the three frequency cases (i.e., θ = 0, θ = π, or θ ∈ (0, π)) and the taper; see Tables 1 through 18 below.
In the case of cyclical long memory or negative memory the true spectrum f (θ) is either equal to ∞ or zero, and inference is problematic. For the purpose of constructing a confidence interval, we propose the quantity f N (θ) = V N (θ)/(N τ θ ) as the "parameter" of interest, although clearly this is a moving target; only in the case of short memory can we conceptually replace f N (θ) by f (θ), via (9). However, whatever the degree of cyclical memory, we can conduct inference for f N (θ) as follows. Denote the quantile function of S θ (b) by Q θ (·). If we wish to consider a single frequency, the confidence interval for f N (θ) with asymptotic coverage 1 − α is which follows from Alternatively, a simultaneous confidence interval can be constructed by considering the maximum and minimum of S θ (b) over the pertinent frequencies. Let S(b) = max 1≤j≤J S θ j (b)/τ θ j and S(b) = min 1≤j≤J S θ j (b)/τ θ j , which have distributions easily computable from the marginals due to independence (they are also identically distributed for θ j ∈ (0, π)). (Note that our notation assumes that the same bandwidth fraction b is used for all frequencies, although this need not be the case in practice.) The corresponding quantile functions will be denoted Q and Q for the maximum and minimum respectively. Let J denote a finite index set, and consider a set of frequencies θ j with 1 ≤ j ≤ J. For positive real numbers , u, The last equality follows from the observation that -when < u -the event {S ≤ } is mutually exclusive with the event {S ≥ u}. This probability is approximately 1 − α if , u correspond to the appropriate critical values; splitting the quantity α evenly amounts to This provides the construction of a simultaneous confidence interval.

Asymptotic Theory for Spectral Distribution Estimation
The estimation of spectral content can be extended to the spectral distribution function F (θ) = (2π) −1 θ −π f (λ) dλ, and because of the smoothing of the spectral density accomplished by integration, the behavior of statistical estimates is easier to describe. In this subsection we assume that the spectral density has short memory, and hence 0 < f (λ) < ∞ for all λ ∈ [−π, π]. We make this assumption so that the rate of convergence of spectral estimates are the same at all frequencies.
Indeed, the classical limit result of Dahlhaus (1985) cannot hold for processes with long memory poles such that β > 1/2, because the limiting variance (see below) depends on the integral of the squared spectral density.
Because the spectral density is even, it suffices to study G(θ) = (2π) −1 θ 0 f (λ) dλ, and its corresponding estimator G(θ) = (2π) −1 θ 0 f (λ) dλ. Very general results for functionals of the periodogram, under general data process conditions, were obtained by Dahlhaus (1985); also see the literature cited in that paper for a history of efforts. Whereas Dahlhaus (1985) utilizes a data taper, here we utilize a covariance taper -in keeping with the previous subsection on spectral density estimation -as other literature has also done (e.g., Priestley (1981)). The novelty of this subsection lies chiefly in adopting a fixed bandwidth ratio framework, and somewhat unsurprisingly the same limit distribution and functional limit theorem is obtained as in Dahlhaus (1985); in particular, neither bandwidth fraction b nor taper play any role in the asymptotic distribution.
Utilizing the definition of the spectral density estimator, we at once obtain where we interpret sin[θh]/h to be the value θ whenever h = 0. Here I(λ) is the periodogram, defined to be N −1 times the magnitude squared of the DFT: 2πh e iλh , which is the pointwise limit of Because of symmetry, g θ is always real, and so the complex exponential can be replaced by a cosine in its definition. We claim that this pointwise limit can be taken in the definition of G(θ). Note that g θ (λ) = 2 −1 1 [−θ,θ] (λ), the sinc function. Let G(·) denote the spectral distribution function's estimate, and the limiting process Z(·) is defined as a mean zero Gaussian process with covariance This kernel is simpler than the one found in Dahlhaus (1985), because we will assume that fourth order cumulants are zero (this could be relaxed, but then a different approach to the estimation of limit quantiles in Theorem 3 would be needed). The kernel actually corresponds to the covariance kernel of a heteroscedastic Brownian Motion (see below).
We focus on G(θ) rather than F (θ), because if we are interested in F (θ) for θ < 0, this is equal to G(π) − G(−θ) by symmetry. So the following functional limit theorem can be stated; in the space C([0, π], R), where the process Z is mean zero Gaussian with covariance kernel (12).
It is interesting that the taper is irrelevant to the asymptotic distribution -this is essentially because the integration involved in the definition of the spectral distribution makes the tapering in the spectral density estimation obsolete. However, the taper and the bandwidth have a substantial impact on the qualitative features of the estimate (see Section 5). The degree of correlation between differing values of the spectral distribution estimator depends chiefly on the smaller frequency, as indicated by (12); variance is increasing in frequency, unto the maximum value G(π) = γ 0 /2.
As an application of Theorem 3, we can construct uniform confidence bands about the spectral distribution function. This is in contrast to the application discussed in Section 3.2, where simul- as N → ∞. The random variables Z = inf θ∈[0,π] Z(θ) and Z = sup θ∈[0,π] Z(θ) determine the spread of the confidence band, and can be calculated via simulation when the covariance kernel is known, or is estimable. Another possibility is to estimate the limit distribution via subsampling (this might be preferable if the assumption on the fourth cumulant is not tenable), as in Politis, Romano, and You (1993).
Let the corresponding quantile functions be denoted by R and R respectively. Then the confidence band probability is approximately 1 − α if , u correspond to the appropriate critical values; splitting α evenly yields This construction differs somewhat from (11), because in that case the limit theorem was formulated as a ratio (for spectral density estimation), whereas here the limit theorem is formulated as a difference (for spectral distribution estimation). Although the limit Z(θ) does not depend on the taper, it does require a knowledge of f . In practice, one must construct an estimate of the covariance kernel (12); we next describe our procedure.
Let M denote a mesh of frequencies, providing a discretization of the Riemann integral defining which is the variance of a heteroscedastic random walk. That is, suppose that { t } is an independent Gaussian sequence, with each random variable having variance f 2 (tπ/M )/(2M ) for M fixed. Then U = t=1 t is a heteroscedastic random walk with variance approximately K(θ, θ), where = M θ/π . We can easily simulate this Gaussian sequence by multiplying f (tπ/M )/ √ 2M times iid normals. Moreover, the covariance function of the process {U } is approximately that of the kernel K, because of the random walk structure.
If f is known (as in the case of hypothesis testing) then we can simulate the process {U } and obtain an approximation to {Z(θ)}, with the association = M θ/π . However, in many applications f is unknown and must be estimated. One could use the tapered spectral density estimates discussed above, or the periodogram (integration over frequencies smooths it out sufficiently to provide consistency). Thus, we construct t via multiplying f (tπ/M )/ √ 2M by a standard normal, independently for each t, and construct the corresponding heteroscedastic random walk { U }. Here f could be the periodogram or the same tapered spectral estimate upon which our original G is based. Then with Z(θ) = U M θ/π , we approximate Z and Z by the minimum and maximum, respectively, over the M values U 1 , · · · , U M . Repeated samples for { t } then yield an estimate for the distribution of Z and Z. Consistency of this implicit estimator K follows from the same assumptions as used in Theorem 3. The upper quantile of R and lower quantile of R yield estimates of u and . This procedure has been implemented and tested in simulation (see Section 5 below).
Alternatively, one may be interested in testing some null hypothesis that naturally supplies f to us. For example, we may be studying the time series residuals arising from a fitted model, and seek to test whether these residuals behave as white noise. Ignoring issues of parameter estimation error, we wish to test whether f (λ) ≡ γ 0 , and hence we can estimate the covariance kernel via This is the kernel of a Brownian Motion process on [0, π], scaled by γ 0 . There exist published quantile functions for the supremum and infimum of BMs, and so the construction of , u is relatively straightforward. In this problem, the null hypothesis also dictates the form of G, i.e., G(θ) = γ 0 θ/(2π), so that if this particular function G fails to lie completely within the confidence bands, we have evidence to reject the null hypothesis. However, such an approach ultimately presumes a parametric specification for the original spectrum, and there are other techniques available for testing model goodness-of-fit in such a scenario. In our applications below, we focus upon nonparametric approaches to spectral estimation.

Higher Order Accuracy of the Fixed Bandwidth Fraction
In this paper we have adopted the asymptotic perspective that bandwidth in spectral estimates is to be viewed as a fixed fraction b of the sample size. Conventional asymptotics stipulate that the bandwidth is vanishing relative to sample size, and the spectral estimates become consistent. As in the HAC literature -which examines the distribution of the self-normalized mean as b → 0, and makes comparison to the conventional asymptotic normality results -we intend to examine the behavior of our limits S θ (b) in Theorem 2 as b → 0. The point of this is to show that that S θ (b) can be viewed as the classical limit distribution S θ (0) plus other stochastic terms that are order b, b 2 , and so forth. This will demonstrate a higher-order accuracy for the fixed bandwidth fraction asymptotics.
Unlike in the HAC case of a standardized sample mean statistic, where the b = 0 case corresponds to a Gaussian random variable, for spectral estimation the b = 0 case corresponds to point mass at the spectral density, i.e., S θ (0) = f (θ) with probability one. Therefore, expansions of the distribution of S θ (b) as b → 0 will use slightly different techniques then those employed in Sun, Phillips, and Jin (2008). We pursue an analysis of the Laplace Transform of S θ (b), providing a small b expansion, and relate this transform to the cumulative distribution function of S θ (b). We utilize an expansion of the Laplace Transform in terms of functions that have known Laplace inverses; we believe this to be a novel method, potentially generalizable to other types of distribution problems.
This method will result in an expansion of the right tailed cumulative distribution function (cdf) in terms of polynomials and exponential functions, with coefficients given by polynomial functions of the cumulants. We show how to compute these cumulants directly from the tapers -although similar types of cumulant calculations have previously appeared in the HAC literature (Sun, Phillips, and Jin (2008)). However, we do not view this expansion as the most practical method for calculating the cdf; in practice, one wants the quantiles of the limit distribution, and these can be obtained via simulation (Section 5).
Fixing θ so that we can drop the subscript, the distribution of S(b) is characterized by its Laplace Transform (LT). From Tziritas (1987), the LT of a Gaussian quadratic form Z, Z T -for a Gaussian process Z with covariance kernel K, and a quadratic form ·, · T with operator T -is given by where κ j is the jth cumulant of S(b), and has the formula Also see the discussion in McElroy and Politis (2012). Briefly, the Gaussian process Z is defined on the space of real-valued function of domain [0, 1], such that the action of an operator A on any element x of this space is given by (Ax)(s) = 1 0 A(s, t)x(t)dt. In equation (14), both K and T are operators, and their composition has action on an element x given by Also, tr denote the trace of an operator, i.e., tr(A) = 1 0 A(s, s) ds. The limit distribution S(b) in Theorem 2 is the sum of two such independent and identically distributed Gaussian quadratic forms (just one copy if θ = 0, π), because it can be written as the sum of two random variables of the type Because the Gaussian processes B are FBBs, the covariance kernel K is that of FBB (Samorodnitsky and Taqqu (1996)). Trivially, the LT of the sum of two iid random variables is the square of their common LT, which amounts to a doubling of each cumulant. In the following treatment, we provide an expansion for the cdf in terms of cumulants; these are given by doubling the formula for κ j in (14) when θ = 0, π, but at frequency zero or π we just take the formula (14) directly. Since the trace of powers of KT is not convenient to calculate, we provide a feasible approximation to the κ j after our presentation of the expansion.
The right-tailed cdf of Z, Z T will be denoted by F , and its pdf by p. The LT of a function φ Then L F (s) = s −1 (1 − L p (s)) using integration by parts, and L p (s) = E exp{−s Z, Z T }. Next, letting 0 j=1 (an empty sum) be equal to zero for convenience, consider the infinite expansion and denote the kth term by the function G k (s). Each such function is actually of order b k , and by carefully expanding them in an appropriate fashion, is the infinite sum of functions with known LT inverse. The initial term in the expansion is i.e., it is the LT of the indicator function on [0, κ 1 ]. This makes sense, because the right-tailed cdf should tend, as b → 0, to an indicator function with boundary marked by its point mass, namely κ 1 = Λ(0) (shown below). The higher order terms are more complicated, but contribute additional perturbations to this indicator function.
The key to the following theorem are the following class of polynomials: let φ n be supported These polynomials have the remarkable property that as shown in Gradshteyn and Rhyzik (1994). Now we can state the main expansion result, which applies more generally than to just the spectral density estimation problem.
Theorem 4 Suppose that a Gaussian quadratic form Z, Z T with covariance kernel K has cumulants given by (14). Then there exist coefficient sequences {α and G 0 (s) = s −1 (1 − e −κ 1 s ), where k≥0 G k is the Laplace Transform of the right-tailed cdf of Z, Z T . The right-tailed cdf has the expansion The coefficient sequences {α n } are derived in the proof, and are fairly complicated expressions in terms of the cumulants. Next, we apply Theorem 4 to the case where b → 0, noting that each subsequent term in the expansion is of higher order. As discussed in Sun, Phillips, and Jin (2008) in the case of a regular taper and a short memory covariance kernel K, the cumulants satisfy ; assuming this, we have the following corollary.
Corollary 1 Suppose that a Gaussian quadratic form Z, Z T with covariance kernel K has cumulants given by (14), and also suppose that We note that the cumulants need not have the behavior κ j = O(b j−1 ) when long memory or negative memory is present, as demonstrated in McElroy and Politis (2012) for the θ = 0 case. In that paper it was shown that the small b behavior of S 0 (b) has a distribution that either explodes to infinity (the case of long memory) or shrinks to zero (the case of negative memory).

Remark 2
As an example, consider the case that κ j = 0 for j > 2, which corresponds to treating all higher order terms in b as zero. Then the LT of the pdf is just which corresponds to a (positive) random variable with mean κ 1 and variance κ 2 , and all higher order cumulants exactly zero. If the random variable were not enforced to be positive, it would correspond to the Gaussian distribution by its cumulant characterization. However, the actual limit is positive and non-Gaussian. Pretending -for the sake of making a comparison with the vanishing bandwidth fraction scenario -that this distribution is really Gaussian would yield the The classic small-b results (Anderson, 1971) state that for λ ∈ (0, π) and taper Λ (satisfying Λ(0) = 1) of bandwidth M , such that M/N + 1/M → 0.
Taking M = bN in this result indicates that our results provide a higher order extension of the classical results, so long as κ 1 = Λ(0) and κ 2 /b ∼ Λ 2 (x) dx; this is shown below.
This completes the higher order analysis. Now we discuss the cumulants κ j further, focusing on the case of short memory. Let us here assume that θ = 0, so that the limit random variable of Theorem 2 is a Gaussian quadratic form. We know that this limit variable is the limit of a statistic is the sample written as a row vector, and A is an N × N dimensional matrix. We proceed to derive this matrix A; the same statistic is equal to Here ι is the column vector of N ones, 1 N is the N × N identity matrix, and I accomplishes mean-centering of the data, i.e., IY is the column vector of sample-mean centered data; if we are not mean-centering, then I can be omitted. Observe that I is idempotent.
We propose to examine the distribution of N −1 Y AY for a Gaussian process {Y t } as an approximation to the general limit Z, Z T . Since our purpose is to gain insight into the cumulants, we will take {Y t } to be white noise (this is appropriate for the short memory case -for cyclical long memory, we must take fractional Gaussian noise). In any event, N −1 Y AY is a Gaussian quadratic form, so that the same description in terms of LT and cumulants applies, and we know its LT converges to that of Z, Z T . The cumulants of N −1 Y AY , applying (14), involve the trace of A j divided by N j . We here develop expressions for these cumulants, take the limit as N → ∞, and obtain formulas for the cumulants of Z, Z T .
The trace of A j depends upon two important quantities, which are tr[Σ j ] and N −1 ι Σ j ι, each of which quantities grows at order N j . We introduce the abbreviation Λ b = Λ(·/b). By the definition of the Riemann integral and a change of variable, and letting x ∈ R j−1 , we obtain when j ≥ 2, where the cumulator function C takes the value 1 − [|x 1 | + · · · + |x j−1 |] wherever the domain produces a non-negative value. This means that Let the limit in (17) be denoted by µ j (b), and note that the bounds on the integrals could also be written as (−b, b) instead of (−1, 1), due to the support of the tapers. Then we can rewrite the limit quantity (for j ≥ 2) as and we denote this limit by η j (b). Again by change of variable, it can be rewritten as Then the first four such trace quantities N −j tr[A j ], for j = 1, 2, 3, 4, have limits given by In the case that no centering is utilized, then I is replaced by the identity matrix, and all the η j terms are replaced by zeroes in the above formulas. Then trivially But when centering is utilized, the general limit for j ≥ 5 is somewhat complicated: one examines the number of partitions of the set {1, 2, · · · , j}. A m-fold partition consists of m disjoint sets whose union is the full set {1, 2, · · · , j}; combinatorial formulas exist to count the number of m-fold partitions of j, as they are called (Stanley, 1997 If we have an m-fold partition, the resulting m sets have various cardinalities k 1 , k 2 , · · · , k m , and of course k 1 + k 2 + · · · + k m = j. Let λ m,j (k 1 , k 2 , · · · , k m ) denote the number of such partitions, so that we have λ 1,j (k 1 ) = j. A partition of j into sets of such cardinalities will be denoted In this manner we can compute asymptotic cumulants, where the leading terms correspond to the case where no centering is used -terms that are zero unless θ = 0 and centering is used are prefaced with a * : , the small b behavior of the cumulants makes mean centering irrelevant, in the sense that as b → 0 the cumulants are the same whether or not mean centering is utilized. The Bartlett case of these formulas is explored in HV, as well as Neave (1970). To summarize, if θ = π then the above formulas apply to the cumulants of S π (b) regardless of whether centering is utilized or not (all * 'd terms are zero); if θ = 0 then the above formulas apply to the cumulants of S 0 (b), with * 'd terms set to zero unless centering is used; if θ = 0, π, then the above formulas apply to the cumulants of S θ (b) once they are doubled (and * 'd terms are zero), although the limit distribution is actually S θ (b)/2 in this case.

Numerical Studies of Size and Bandwidth Selection
This section now discusses some more practical aspects of spectral density estimation. We first discuss a method for calculating the limiting distribution, apart from direct simulation of the limit variable in Theorem 2. Then we provide quantiles for this distribution for three tapers, and investigate coverage in finite sample simulations as a function of bandwidth fraction b. Finally, we provide a discussion of optimal bandwidth selection.

Computing the Spectral Distribution
In the case that θ = 0, π, the distribution S θ (b) has a particularly elegant representation in terms of its Laplace Transform, by which its right-tailed cumulative distribution function can be computed exactly from a knowledge of the taper. From Theorem 2, we know that the limit distribution is the sum of two iid copies of Z, Z T , whose LT can be written as det [id + 2sKT ] −1/2 , cf., Tziritas (1987). Here id denotes the identity operator. Therefore the LT for the sum of two such iid variables -denoted by Z, Z T ⊕ Z, Z T as a shorthand -will be the square of each variable's LT, namely det [id + 2sKT ] −1 , or the product of (1 + 2sλ j (KT )) −1 for the eigenvalues λ j (KT) of the operator KT . As discussed in the previous section, the limit distribution in Theorem 2 can be estimated by studying a finite-sample Gaussian quadratic form with matrix A = IΣI; in particular, we can calculate the N eigenvalues of A using linear algebra (if N < 1000 this is not particularly burdensome). Then these should be estimates of the limiting eigenvalues in an aggregate sense; but the infinite product j≥1 (1 + 2sλ j (KT )) −1 can be expanded using partial fractions. We provide details below.
Since θ = 0, π, the spectral estimate has the form N −1 Y AY with I as defined in Section 4, and ). Let λ j (A) be the jth largest eigenvalue of A, with 1 ≤ j ≤ N , computed using linear algebra on a computer. The LT of N −1 Y AY , which converges for all s pointwise to the function j≥1 (1 + 2sλ j (KT )) −1 , can be expressed as N j=1 (1 + 2sλ j (A)) −1/2 . While in finite sample the eigenvalues of A are distinct, asymptotically they have a paired structure, such that each eigenvalue appears with multiplicity two, resulting in the squaring of the square root symbol. First we show that a knowledge of the limiting eigenvalues λ j (KT ) provide the cumulative distribution function, and then we propose estimating these eigenvalues by the λ j (A).
Here the coefficients α j can be obtained via linear algebra, described below, when the eigenvalues are eventually zero, or if we approximate the infinite product by a truncation to j ≤ J for suitably large J. Moreover, is another partial fraction decomposition of interest, and the structure actually implies that α 0 = 1 must hold. As discussed in Section 4, the LT of the right cdf for spectral limit is equal to s −1 times one minus the LT of pdf of the spectral limit distribution, and hence by (19). Now using the linearity of the LT, we obtain by inversion This gives an exact formula for the right-tailed cdf of the limit distribution S θ (b) in terms of the eigenvalues of the operator KT . Unfortunately, this technique does not work for θ = 0, π. We propose to estimate the limiting LT via which essentially assumes that consecutive eigenvalues of A are so close as to be virtually identical.
Then in the partial fraction decomposition, we substitute the known eigenvalues λ 2j−1 (A), and compute the corresponding α j .
Here we discuss how to calculate the partial fraction decomposition a bit more generally. Sup- and τ (j) (s) for 0 ≤ j ≤ J is computed using polynomial multiplication (easily encoded on the computer). Let the coefficients of each polynomial τ (j) (s) be denoted τ (j) k for 0 ≤ k ≤ J, and note that τ (j) 0 = 1 for each j by construction. Whereas τ (0) is degree J, the other polynomials have degree J − 1, though they are multiplied by s in the expansion (21). Then taking the expansion and gathering powers of s produces, after simplification, which provided the matrix is invertible, can be solved for the α j coefficients. Although this technique provides the right-tailed cdf of S θ (b), we still need to compute quantiles, and it is unclear how do this using (20). As in McElroy and Politis (2012), we have simulated the distribution of S θ (b) for some tapers, when θ = 0, π (the case of θ = 0, π produces a distribution for S θ (b) identical to the HAC case, and its quantiles can be found in published literature such as Kiefer and Vogelsang (2005)), and reported a summary in the following table. We focus on three tapers: the Bartlett and two trapezoidal tapers.
First consider the limit distribution S θ (b) of Theorem 2 in the case that θ = 0, π. In this case, recall that mean centering is irrelevant, so that the limit is a quadratic functional of FBM rather than FBB; moreover, there is a doubling effect, where S θ (b) is really the sum of two iid random variables. In the case that θ = 0 or θ = π, the limit S θ (b) is given by just one of these random variables. Furthermore, when θ = 0 and we construct our spectral estimates by mean-centering, where q is the quantile. In the case that some of the lower quantiles take on negative values (there is no guarantee that spectral density estimates and limit distributions be strictly positive unless a positive definite taper is utilized), the regression function is just a quintic. All regression coefficients are reported in the Tables 1 through 18, along with the R 2 for the regression, with an asterisk marking those cases where regression is onto a quintic rather than an exponential quintic.
In some cases the coefficients exhibit a non-monotonic pattern in increasing α, which is attributable to the regression error. For purposes of inference, the simulated quantiles arising from the tables are adequate. Note that Tables 1 through 9 give the quantiles for the case that frequencies are between 0 and π, for β = −.8, −.6, −.4, −.2, 0, .2, .4, .6, .8, while Tables 10 through 18 provide the same for the case of frequencies equal to 0 or π (assuming no mean-centering is used in the frequency 0 case).
Because the case of a frequency between 0 and π involves the sum of two iid variables (Theorem 2), versus just one such variable in the frequency 0 or π case, the quantiles are a bit larger and have more positive mass. When using non-positive definite tapers, such as the Trapezoidal tapers, the limit distribution has some mass on the negative half-line, and there is more such mass in the frequency 0 and π cases. No such negative mass occurs with the Bartlett taper, because it is positive definite. Another feature that can be gleaned is the small b behavior of the quantiles as a function of β, namely that the quantiles shrink towards zero as β increases, when b is small (examine the first coefficient c 0 in the tables). However, for negative memory (β < 0) the quantiles

Simulation Study of Finite-Sample Coverage
The large bandwidth asymptotic theory provides a superior approximation to the finite-sample distribution of spectral estimators, as discussed in HV and Sun, Phillips, and Jin (2008). Hence, this should provide superior coverage for confidence intervals and confidence bands; the work of HV illustrates this superior coverage, as compared to the classical normal approximation (utilizing small b methods). We seek here to extend those numerical results to an investigation of long memory, and also to spectral bands. Therefore, we first consider a seasonal long memory process CM(β,π/6), adopting the pattern of study discussed in HV. Secondly, we consider an AR(2) process that generates a spectral peak, and compute the spectral distribution estimators, generating the corresponding confidence band. We are interested in determining the proportion of simulations for which the estimated spectral bands contain the true spectral distribution.
The long memory study begins by simulating 5,000 Gaussian time series of length N = 50, 100, 200 from a process with spectral density which satisfies (3). Here we take θ = π/6, which is a frequency of interest in monthly economic time series exhibiting seasonality (see Holan Table 19. For each of 5,000 simulations, we compute the spectral estimate f (θ) at the frequency θ = π/6 of interest, construct the interval using (10) to replace negative values with half of the Bartlett estimate.) Essentially, our spectral estimate is computed using the maximum with zero, and the limit distribution should be modified accordingly.
In cases where a lower quantile, obtained from Tables 1 through 18, we replaced the lower boundary of the interval by zero (a more rigorous approach is to simulate the distribution max{S b (θ), 0}); even using such an approximate technique, we obtained quite favorable results for the Trapezoidal tapers, across all values of β.
Now these coverage results are idealized, because we presume to know the true β when utilizing limit quantiles. In practice, an estimate of β would be obtained, and then appropriate quantiles could be simulated. If we instead always utilize β = 0 quantiles, even when mis-specified, the coverage deteriorates significantly (we have not systematically investigated this) because the quantile functions are quite sensitive to β. From the standpoint of coverage, an argument for using larger values of b when β < 0 can be made, although this in tension with statistical power; the next subsection shows that when negative memory is present, small values of b decrease the width of the interval, and hence increase the statistical power of detecting departures from a null hypothesis.
This discussion is continued further below.
For the second simulation study, we wish to investigate the coverage for the spectral distribution band method described in Section 3. We consider a cyclical process {Y t } given by the AR (2) equation

Optimal Bandwidth Selection
Although the LT can be used to compute some moments of S θ , allowing one to study the mean and variance as a function of b, it is difficult to deduce the overall impact of b on the width of confidence intervals for f (θ). Given a choice of taper and coverage α, it is natural to seek a bandwidth that yields the minimal possible interval width -such a bandwidth might be considered to be optimal.
Our asymptotic expansion results in Section 4 indicate that as b → 0, the distribution of S θ (b) tends to a point mass at unity in the case of short memory, so that optimality always corresponds to b = 0. In McElroy and Politis (2012), it was proposed to examine optimal bandwidth b as a function of underlying memory parameter β θ , seeking b such that the quantile of S θ (b) was as small as possible. Taking the same approach here, we numerically determine the optimal b for the Bartlett and Trapezoidal tapers, now focusing on the frequencies θ = 0, π. By keeping the quantiles as small as possible, we make the confidence interval as small as possible while maintaining its asymptotic coverage.
When negative memory is present, both upper and lower tails of the asymptotic distribution increase as b → 0, with the overall effect that the confidence interval becomes more narrow; therefore a small bandwidth fraction of b = .02 is always preferred. When long memory is present, this behavior can be reversed, such that narrower intervals occur for mid to large values of b. This is summarized in the Tables 25 and 26, which present optimal choice of bandwidth fraction as a function of memory parameter β, α size, and taper. The first table considers the case of frequencies in (0, π), while the second considers frequencies 0 or π. The key difference between these cases, is that the former (Table 25) contains larger optimal b values, while the latter (Table 26)

Empirical Applications
Spectral analysis has a diverse range of applications. Here we suggest only a few of a myriad of applications.

Identifying Business Cycle in Retail Series
First, suppose one is analyzing a monthly or quarterly economic time series, such as total retail sales, and is interested in identifying periodicities by estimating spectral peaks. However, such time series are typically nonstationary, exhibiting strong trend growth and seasonal behavior. The recoloring approach of Grether and Nerlove (1970) is a well-respected technique for estimating spectra in such a case: one differences the time series to remove nonstationarity, estimates the spectrum of the result, and then divides again by the magnitude squared of the frequency response function of the differencing operator. Such a spectral estimate is called a pseudo-spectral density estimator; we are interested in both the pseudo-spectrum and the spectrum for the seasonally differenced series.
For example, suppose that we have monthly seasonal data, which exhibits strong trend and seasonal effects, and are interested in estimating the spectral density in order to examine the potential for a business cycle (identified as a spectral peak between frequencies 2π/24 and 2π/60 for monthly data). If the data requires one seasonal difference to produce a stationary series, then re-coloring dictates that our spectral density estimate computed from the differenced series be divided by |1 − e −i12λ | 2 , which of course is not well-defined at frequencies that are multiples of π/6. The result is the pseudo-spectral density estimator.
We apply the methods of this paper to the monthly series of total retail sales for the major industry classifications 441 (Motor Vehicles and Parts Dealers), available from the U.S. Census Bureau 1 . We consider a variety of tapers and bandwidth fractions, using the re-coloring approach with regression-adjusted data, covering the years 1992 through 2012. Our objective is to create a graph of the spectral density estimate with sufficient resolution to examine business cycle effects, and also provide measures of uncertainty at each frequency. Because the business cycle has a period of two to ten years in general, the minimum number of frequencies needed is 60 (a ten year cycle for monthly data corresponds to frequency π/60). Thus we will take ω j = πj/60 for 0 ≤ j ≤ 60; note that ω j for 1 ≤ j ≤ 5 are the business cycle frequencies.
Also ω 0 = 0 corresponds to the trend frequency, which will be an infinity due to re-coloring. Similarly, ω 10k for k = 1, 2, 3, 4, 5, 6 corresponds to seasonal frequencies, which will also be poles in the pseudo-spectrum.
If we focus attention on the nonseasonal frequencies, we can apply the methods of Section 3 to construct confidence intervals. First, we observe that the assumption of asymptotic independence of spectral estimates seems reasonable here, because we are not considering Fourier frequencies (the sample sizes are 252); moreover, if the sample sizes were increased, we would still consider the same 61 frequencies, because they are ultimately determined by the sampling frequency (12 times a year) and the business cycle periodicities. Therefore it makes sense to view these 61 frequencies as being fixed as sample size increases, and thus Theorem 2 is applicable, producing independent asymptotic distributions. In some other types of applications, the frequencies of interest might depend upon sample size, and a different type of analysis would be required.
Once the spectral estimate for the differenced series has been determined, we divide by the magnitude squared of the differencing operator, in order to provide an estimate of the pseudospectrum. For better visibility, we plot in a log scale, restricting to the Bartlett taper for this exercise only. Then the confidence interval for log[f (λ) |1 − e −i12λ | −2 ] at the 54 non-trend and non-seasonal frequencies is which follows from (10). Such series typically have quickly decaying autocovariances, so we use the β = 0 quantiles to form the intervals.
Construction of the confidence intervals focuses on α = .05 (the case that α = .10 was also considered, but is not visually much different), using the Bartlett taper with b = .04, .1, .2, .5 for bandwidth fractions, with sample size of n = 252. Recall that the quantiles, which come from our simulations of the previous section, assume that centering by the sample mean has not been used (this is only pertinent at the case of frequency zero), and the slightly wider coverage at frequencies 0 and π result from using Table 14 instead of Table 5; however, this has no relevancy due to the re-coloring. The results are plotted in Figure 1. The structure of spectral peaks is salient, due to re-coloring, but in-between the peaks the impact of bandwidth becomes evident in the smoothness 1 Monthly Retail Trade and Food Services survey.
of the function. It is difficult to discern any business cycle phenomena in these plots, which would be flagged as a bump in between cycles 0 and 1 in the spectral density functions.
Another application is concerned with the detection of residual seasonality in seasonally adjusted time series. Supposing that the above construction series has been processed by the program X-12-ARIMA, we then compute their (trend-differenced) adjustments' spectral distribution functions, calculating the spectral bands to quantify uncertainty. Classic references on seasonal adjustment, and the X-12-ARIMA program, include Bell and Hillmer (1984)   and 1 (i.e., for frequencies up to π/6) indicates near constant spectral mass, and behavior similar to white noise; there is no sharp increase in the vicinity of any of the key seasonal cycles. One overall conclusion, from each of the plots, is that no significant seasonality remains. The impact of bandwidth fraction is much less apparent than in the spectral density estimates, which we expect from our asymptotic theory. One interesting feature can be discerned when comparing tapers; the trapezoidal tapers produce, in some cases, spectral distribution estimates that decrease at some frequencies, violating the fact that spectral distribution functions are monotonically increasing.
This occurs because the trapezoidal tapers are not positive definite; in contrast the Bartlett taper, being positive definite, does not have this problem -though we can expect the width of the spectral bands about the estimator to be too small, especially for small b, as discussed in Section 5.

Long Memory Spectral Analysis of Housing Starts
Here we consider regional housing starts, for the South region, measured at a monthly frequency from 1964 through 2012, available from the U.S. Census Bureau. We analyze the data here with a nonparametric approach, attempting to plot the spectral estimates for a variety of bandwidths, taking any seasonal long memory into account when quantifying uncertainty. We consider the same grid of frequencies as in the retail series, but are principally interested in the seasonal frequencies.
The South starts has been cleaned of outliers and level shifts, and we utilize a log transformation to stabilize variability. Analysis of sample autocorrelation plots for the first differences (to eliminate trend growth) reveals the presence of highly persistent correlation at seasonal lags (multiples of twelve), which indicates either nonstationarity or seasonal long memory. A common approach with such series is to utilize seasonal differencing -under the assumption of seasonal unit roots being present -but here we proceed with a hypothesis of stationarity, instead proceeding to estimate the seasonal long memory. This seems to be a plausible investigation, given the long sample size.
In order to obtain the right quantiles in each case, it is necessary to know the cyclical memory.
It is reasonable to suppose, based on the form of nonstationarities in such series and the discussion above, that cyclical memory may be present at frequencies ω 10k for 0 ≤ k ≤ 6, and at no others.
To estimate the cyclical memory β θ for these seven frequencies, one can adopt the crude estimation The impact of bandwidth fraction is quite evident in the plots; smaller values of b enforce more smoothing. As was noted in Section 5, when long memory is present the confidence interval can lie completely above the point estimate, and this is evident in the figures with b = .04. Apart from the two long memory seasonal peaks, the other frequencies don't have this property, as they have short memory dynamics. We also highlight that at frequencies 0 and π the confidence intervals are slightly wider to reflect the heightened uncertainty. Figure 6 gives the Bartlett estimators in log scale, which allows easier viewing of some of the features. This transform is not possible for the trapezoidal tapers, because the spectral density estimates take on negative values. Visually speaking, the impact of the trapezoidal taper, in contrast to the Bartlett, is to shift the estimate downwards -this improves bias and coverage, but at the cost of losing positivity. Otherwise, there is little to discriminate between the tapers, given the same choice of bandwidth.

Conclusion
This paper provides a new study of taper-based spectral estimation from the perspective of fixed bandwidth ratio asymptotics. Classical spectral estimation theory assumes that the bandwidth is negligible with respect to sample size, asymptotically, while the so-called "fixed-b asymptotics" allows for a constant ratio of bandwidth to sample size. Previous work on fixed-b asymptotics for spectral density estimation (HZ) has focused on short memory dynamics and a single frequency, but we make extensions in several directions: (i) we study joint convergence over a finite collection of fixed frequencies; (ii) we allow for cyclical long memory at any of these frequencies; (iii) we provide results for flat-top tapers and tapers with kinks, extending the cases studied by HZ (Bartlett and smooth tapers); (iv) we provide a discussion of higher-order accuracy in the short memory case, by an expansion of the cumulative distribution function of the spectral density estimate's limit; (v) we study spectral distribution estimation in the context of fixed-b asymptotics, and develop the application of simultaneous confidence bands; (vi) we tabulate the spectral density estimate's limit quantiles, as a function of taper, memory parameter, and bandwidth fraction; (vii) we empirically examine coverage of the spectral density and spectral distribution estimates.
Regarding the joint convergence result, this produces the unsurprising conclusion that density estimates are asymptotically independent; however, this requires the assumption that frequencies are treated as fixed, in the sense that they do not depend upon sample size. This precludes an application with Fourier frequencies, which would require a separate analysis (and is the subject of current work). In our applications to the topic of seasonal peak detection, we illustrate why Fourier frequencies may not be the most suitable grid of frequencies for a given application. We also emphasize that the limit distribution under fixed-b asymptotics depends chiefly on the bandwidth fraction b, the underlying memory at the particular frequency, whether or not the data was centered by some estimated mean function (such as the sample mean), and finally whether the frequency λ is internal (i.e., λ ∈ (0, π)) or on the boundary, where λ = 0, π. In fact, the issue of mean centering is only pertinent for the limit distribution when λ = 0.
Regarding the second point, we have developed new results for sample means and DFT statistics for processes with long memory poles or zeroes in their spectrum, and our formulation of cyclical memory can be connected with more familiar processes, such as Gegenbauer processes and seasonal ARFIMA, etc. This is a growing topic in economic time series, to investigate models where each frequency can have its own memory parameter associated; the limit distribution, as well as the rate of convergence of the spectral estimator, depend upon this memory parameter. This treatment represents a novel generalization of frequency zero results discussed in the application of HAC estimation, as in McElroy and Politis (2011).
The third point has regard to the types of tapers that one is utilizing. Some popular tapers have kinks (i.e., places of non-differentiability) or even jump discontinuities -the latter arises with the truncation taper. The flat-top tapers, including the trapezoid functions, are known to have improved bias properties in the short memory case, but a point of concern is that they are not positive definite. The trade-off is that the resulting spectral density estimates need not be positive, precluding their estimation and viewing in log scale; ad hoc solutions, such as truncation at zero, may of course be utilized. Our numerical results demonstrate that the improved size properties of the flat-top tapers carries over to the long memory scenario as well in the case of spectral estimation, the improvement over the Bartlett being more dramatic for small b. While results of this type for long memory HAC estimation have also been shown, as in McElroy and Politis (2011), the case of spectral density estimation is novel.
Higher-order accuracy for studentized statistics, such as in sample means normalized by HAC estimates of variability, can generate an expansion about b = 0 in the limit distribution, using the intuition that the small b case corresponds to a standard normal distribution. However, in the case of spectral density estimation, the small b case essentially corresponds to point mass at unity, because the limit theorem involves the ratio of estimator to estimand. We therefore have developed some novel tools for the investigation of higher-order accuracy, proceeding via studying the Laplace Transform of the cumulative distribution function of the limit random variable S(b).
Focusing on the short memory case, we demonstrate that the first term in the expansion is an indicator function, which is the cumulative distribution function of the point mass, with location that differs from unity by a term of order b. Higher order terms can be understood through a convenient basis of functions, with coefficients that explicitly depend on the cumulants of S(b).
Then one can explicitly see that taking b > 0 essentially provides a more nuanced description than is possible with a classical description.
Spectral distribution estimation also has an extensive history, and tapering is not necessary to produce consistent estimation. However, if a practitioner utilizes a taper-based estimate of the spectral density, and then also wishes to examine the spectral distribution, the latter should be estimated in such a way that its derivative equals the density estimate. With this motivation, we analyze taper-based estimates of the spectral distribution function, and obtain, unsurprisingly, the same theorems as the classical case explored by Dahlhaus (1985). We then develop a technique for constructing spectral bands, and discuss how the limiting covariance kernel -associated with the functional limit theorem -can be estimated. We are not aware of literature treating the formation of bands, apart from the simple case of white noise; we discuss the empirical coverage, and the impact of taper and bandwidth in finite sample performance.
In order to compute the distribution of the limit S(b), we propose an exact method involving the Laplace Transform of a Gaussian quadratic form, so long as the eigenvalues corresponding to a taper can be calculated. We also provide the quantiles for several tapers by simulations, and illustrate spectral density estimation with intervals constructed via cyclical long memory estimation, as well as a re-coloring approach to spectral estimation for nonstationary time series. Finally, we show the construction of spectral distribution estimates and their confidence bands on a retail series.
Although this paper attempts to study several questions, many more are raised in the process.
What is the statistical behavior, from a fixed-b perspective, when frequencies are becoming asymptotically closer to one another? Can a higher-order expansion be developed when there is long memory or negative memory present? What is a sensible criterion for optimal bandwidth selection, that takes into account the smoothness across multiple frequencies? (Thus, optimality should be discussed in different terms from the HAC literature, which only has a single frequency to consider.) Some of these queries we plan to study in future research.

A.1 Regressions and Bridges
For more background on this topic, see Phillips (1998). Suppose that our process {Y t } satisfies Y t = X t + µ t with {X t } mean zero and stationary, but µ t is deterministic, and is fully described via a collection of p regressors expressed in a column vector x t , whose components are written as x j t for 1 ≤ j ≤ p. Supposing a sample of size N is available, it is convenient to write in terms of column vectors: Y = [Y 1 , Y 2 , · · · , Y N ] , and similarly for X and µ, so that Y = X + µ = X + Xβ, where the regression design matrix X is N × p, the column vector β contains p regression parameters, and the tjth entry of X is x j t . Then the ordinary least squares estimate of µ is In order to find a convenient asymptotic representation of Y − µ, and the partial sums thereof, we assume that there exist a collection of rates a j N for 1 ≤ j ≤ p such that x j k = a j N x j (k/N ), where the functions x j ∈ C[0, 1]. For example, the regressor x j k = k j , which is used to express the mean as a polynomial in time, satisfies this condition with the choice a j N = N j . However, the regressor x j k = cos(2πjk) does not satisfy this condition, so care is needed in applying these results.
Collecting the rates into a diagonal matrix A N = diag[a 1 N , a 2 N , · · · , a p N ], we write x k = A N x(k/N ). Then which provides a simplification in the formula for µ. Now suppose we are interested in the limit behavior of t=1 (Y t − µ t )g t , which looks like S [rN ] (g) in Section 3.1, except that we have centered by an estimate of the mean. Here r ∈ (0, 1]. Then linear algebra yields Now from Section 3.1, the functions of interest are g t given by cosines or sines at various frequencies. But by Remark 1, mean centerings are irrelevant except when g t = cos(θt) and θ = 0, i.e., g t ≡ 1.
In this case, and utilizing the convergence of Riemann sums to the Riemann integral, we obtain where V N is the variance of N t=1 X t . Therefore, given a FCLT such as Theorem 1 for the partial sums, such that V The process on the right hand, denoted by B, is called a Brownian Bridge when B is a Brownian  The results of this paper can be extended to the more general class of Bridge processes under the assumption that the mean functions are adequately described by fixed regressors and that the scaling assumption is valid, and furthermore that we use ordinary least squares to provide a mean estimate. This only has ramifications at frequency zero -all other DFT and spectral results involve the FBM and not the generalized FBB. In practice, spectral analysis on a time series proceeds only after certain transformations (Box-Cox transforms and/or differencing) have been applied to the data to remove non-stationarity. Residual mean effects are likely to involve a constant mean function, or at worst a linear function of time, plus other types of fixed effects corresponding to interventions (e.g., additive outliers, level shifts, calendrical effects, and so forth). These latter types of regressors are dummies of various types whose asymptotic impact are hopefully negligible.

A.2 Proofs
Proof of Proposition 1. We provide the proof for the θ > 0 first. Observe that |h|≤n γ h cos(ωh) = |h|≤n γ h e −iωh by symmetry, and hence n h=1 If ω = ±θ, this quantity is asymptotic to If ω = θ the partial sum is asymptotic to The limiting behavior of |h|≤n γ h e −iωh is obtained by summing with the complex conjugate of the above derivations, and adding γ 0 . Thus as n → ∞. By 3.761.4 of Gradshteyn and Rhyzik (1994), ∞ 0 sin(x) x α−1 dx = π sec(πα/2)/(2Γ(1 − α)), which happens to equal π/2 when α = 0; hence the short memory spectral density is the limit, as expected. But for nonzero α, the sum at ω = ±θ is asymptotic to n −α L(n)g(θ)|2θ| α L(|2θ| −1 ) sec(πα/2)/Γ(1− α), which agrees with (2), where β θ = −α and L θ is defined as the slowly-varying L times the con- Finally, in the case of θ = 0 similar calculations yield which shows that the process is C(−α,0), as desired. Noting the following generic trigonometric identities cos(ωk) cos(λk) = 1 2 (cos(ω + λ)k + cos(ω − λ)k) the above expansion can be rewritten as Next, apply Remark 1 so that -because θ i = θ j -the summations over m above are bounded in n; replacing these summations with one is then valid asymptotically in the statement of the lemma, since V which tends to zero because β θ i − β θ j < 2. Now for the case that cs = sin, we have Applying the identities produces n k, =1 Since the angles are distinct, Remark 1 shows that the inner summations can be asymptotically ignored, and the rest of the argument follows the cosine case. Finally, suppose that the first cs = cos and the second cs = sin. Then Applying the identities produces and the same arguments handle this case as well. Also, even if θ i = θ j , the normalized sum will still tend to zero, because the non-negligible inner sum in this case accompanies a sine summation, which by symmetry will be zero. This completes the proof. First, define S n to be a length 2J vector with components for j = 1, 2, · · · , J. Because of mean-centering and the equivalency to the linearly interpolated version, it suffices to study the finite-dimensional distributions of S n . Consider m times r 1 < r 2 < · < r m ∈ [0, 1], and set r 0 = 0. Take any real numbers ν 1 , ν 2 , · · · , ν m , and any collection of real numbers η 1 , η 2 , · · · , η 2J , written as a 2J component column vector η. The convergence of the finite- (η j cos(θ j t) + η j+J sin(θ j t)) V we have η S [r k N ] = S [r k N ] (η(θ)), in the notation of Section 3.1. Then m k=1 ν k S [r k N ] (η(θ)) is asymptotically standard normal -when normalized by the square root of its variance -under the P1 assumption, using the argument of Theorem 5.2.3 in Taniguchi and Kakizawa (2000). In this case, it then suffices to show that the variance of m k=1 ν k S [r k N ] (η(θ)) has as limit the variance of m k=1 ν k J j=1 (η j W +,θ j (r k ) + η j+J W −,θ j (r k )). Similarly, the cases of P2 or P3 can be handled as in the proof of Theorem 3 of McElroy and Politis (2011); when P2 holds, we need the unit Hermite rank assumption to ensure that the limit variables are Gaussian.
So we now study the variance of the partial sum, obtaining the expansion

The above identity generalizes (A.5) of McElroy and Politis (2011), and
Now by Lemma 1 the only non-negligible terms asymptotically (here n → ∞ as N → ∞, in any case being some fixed proportion of N ) occur when j 1 = j 2 (and note that mixed terms involving cosine and sine are always negligible). Thus the above variance simplifies asymptotically to J j=1 n t 1 ,t 2 =1 γ t 1 −t 2 η 2 j cos(θ j t 1 ) cos(θ j t 2 ) + η 2 j+J sin(θ j t 1 ) sin(θ j t 2 ) V −1 N (θ j ).
Utilizing (A.1) from the proof of Lemma 1, but applied to the case where the two angles are not distinct, the above quantity is shown to be asymptotic to because the double sine term is identically zero if θ j is 0 or π (but the cosine term gets doubled in this case). This calculation uses from (5). As a result, the variance of m k=1 ν k S [r k N ] (η(θ)) is asymptotic to Now the variance of m k=1 ν k B(r k ), where B is a FBM of parameter β θ j , is equal to the expression in parentheses in (A.2). Because the processes W +,θ j and W −,θ j are independent, (A.2) is equal to the variance of m k=1 ν k J j=1 (η j W +,θ j (r k ) + η j+J W −,θ j (r k )). This completes the proof that the finite dimensional distributions converge.
To prove tightness, let γ = (κ + δ)/2; we will apply Theorem 2 of Gihman and Skorohod (1980, p. 410) with the metric ρ(x, y) = , we obtain using summation by parts Next, define approximate first and second derivatives of the taper via for any r ∈ [0, 1]. This allows the following integral representation: Now Theorem 1 provides convergence results for the DFTs, once suitably normalized by V N (θ j ) for each θ j ; these results can be extended at once to S [rN ] (c(θ)) and S [rN ] (s(θ)), with limiting Fractional Brownian Bridges B ±,θ (r), as defined in Section 3.2. We also need to determine the limit of the approximate derivatives of the taper. For values of r such that Λ b is twice continuously differentiable, i.e., for |r| ∈ (c, 1), we have ∂ 2 These results also holds for |r| ∈ [0, c), but here the limit of either derivatives is identically zero, because of the flat-top structure. In considering the limit of the quadratic term, we restrict to the region bc < |r − s| < b in the double integral, but also must account for the boundary terms where |r − s| = bc and |r − s| = b, which result in terms asymptotic to respectively, for the cosine terms, and similarly for sines. Now dividing f (θ j ) by V N (θ j ), we obtain a joint convergence for 1 ≤ j ≤ J, and apply the functional limit theorem to each partial sum in turn, and obtain the stated limit distribution. In the case that a jump discontinuity exists at c, we instead obtain that the terms in the expansion of N f (θ) involving a double summation cancel out -for indices to either side of c -while This provides the stated limit in the case of a jump discontinuity. 2 Proof of Theorem 3. First note that the functions G, G ∈ C[0, π], which follows from the assumptions on f and the Riemann integral. We first establish convergence of finite-dimensional distributions. For any θ ∈ [0, π], we have the decomposition When θ = 0, both G(θ) and G(θ) are zero, so the result is trivial; hence assume θ > 0. Using a Taylor series expansion of Λ about zero, the first term above is decomposed into The spectral density h (Λ(h/bN ) − Λ(0)) sin(θh)/(2πh)e iλh exists for all λ, since the real part of sin(θh)e iλh equals one half of sin((λ + θ)h) − sin((λ − θ)h), and the taper is bounded (note that sin((λ±θ)h)/h is an alternating sequence). Thus sup λ | |h|<N (Λ(h/bN ) − Λ(0)) sin(θh)/(2πh)e iλh | < ∞ for all N , and sup λ | |h|≥N sin(θh)/(2πh)e iλh | < ∞ for all N as well. Then by the Hölder inequality, and the fact that (2π) −1 π −π I(λ) dλ = γ 0 is bounded in probability, both summands in (A.3) are bounded in probability.
For the second summand, we apply the Dominated Convergence Theorem to take the limit as where the weak convergence follows from Lemma 3.1.1 of Taniguchi and Kakizawa (2000). Note that this lemma is proved under either condition P3 or P1, but separate results in Taniguchi and Kakizawa (2000) treat the P2 case in detail as well. The central limit theorem can also be stated jointly over any finite collection of θ frequencies.
Letting β k (x) = exp k (x), we proceed to the MacLaurin series expansion, which yields and hence Noting that k (0) = 0 for each k, we have β k (0) = 1 for each k, and hence the n = 0 coefficient in the expansion for G k (s) is zero. Removing this term, changing the index, and canceling s −1 yields Hence the coefficients stated in Theorem 4 are By (16), the representation of G k in terms of sums of Laplace Transforms of the φ n+1 immediately follows. Calculation of the β (n) k (0) coefficients proceeds as follows. Let ∞ (x) = ∞ j=1 (−1) j κ j j x j (1 − x) −j , and note that k is obtained from ∞ where all the cumulants κ j = 0 in the latter when j > k.
Then with β ∞ (x) = exp ∞ (x) and calculation of derivatives, we obtain

Additional calculations show that
(r) Higher order coefficients, in terms of cumulants, are calculated in a similar fashion. We obtain β (n) k (0) from β (n) ∞ (0) by setting all κ j to zero for j > k; this produces the following sequences, for n = 0, 1, 2, · · · : α (1) It is hard to deduce a general pattern for the coefficients in terms of cumulants, but any particular sequence can be calculated in this manner. 2 Proof of Corollary 1. It follows from the proof of Theorem 4 that the coefficients α (k) n+1 for fixed k involve no cumulants κ j with j > k + 1, and each coefficient is a product of κ k+1 times other cumulants. This is because any terms in β (n+1) ∞ (0) that feature only κ j for j ≤ k will be common to both β  n+1 (for any n ≥ 0). Also, no terms that only involve κ j with j > k + 1 exist, all these quantities being set to zero; thus, only terms that involve κ k+1 contribute to the α (k) n+1 sequences. Next, because |κ j | ≤ C j 2 j (j − 1)!b j−1 for constants C j -by results in Sun, Phillips, and Jin (2008) Table 25: Optimal bandwidth fraction, determined for each taper (Bartlett or Trapezoidal), long memory parameter β, and α-level, for frequencies between 0 and π. Optimality means that the confidence interval is the shortest possible among all bandwidth fractions b ∈ (0, 1]. The two-sided intervals are based on α-levels .20, .10, .05, .005 (with half of this α assigned to the upper and to the lower quantile in the confidence interval).
Optimal Bandwidth Fraction, frequency is 0 or π  Table 26: Optimal bandwidth fraction, determined for each taper (Bartlett or Trapezoidal), long memory parameter β, and α-level, for frequencies 0 and π. Optimality means that the confidence interval is the shortest possible among all bandwidth fractions b ∈ (0, 1]. The two-sided intervals are based on α-levels .20, .10, .05, .005 (with half of this α assigned to the upper and to the lower quantile in the confidence interval).