## Optimal Variance Estimation for a Multivariate Markov Chain Central Limit Theorem

- Author(s): Liu, Ying
- Advisor(s): Flegal, James M
- et al.

## Abstract

Markov chain Monte Carlo (MCMC) methods are often used in Bayesian analysis to approximate expectation with respect to a certain distribution. Monte Carlo standard error (MCSE) can be used to determine the desired number of dependent samples, as well as to construct confidence intervals of MCMC estimates. It is usually unknown and various techniques have been suggested to estimate MCSE. A fundamental problem for these techniques is to choose an appropriate bandwidth. Previous research shows that a bandwidth proportional to $n^{1/3}$ is optimal for certain estimators. The proportional constant of $n^{1/3}$ however is unknown. As a result, $n^{1/3}$ is suggested although sub-optimal due to the missing proportional constant. $n^{1/2}$ was also considered to account for the constant but its asymptotic performance is worrisome.

Existing literature mostly considered the above issues under univariate setting but Bayesian analysis normally involves multiple parameters. Computation time is a major challenge to estimate multivariate MCSE, where large amount of dependent samples are involved. Therefore multivariate estimators of MCSE that delivers fast and accurate estimation is desirable.

This dissertation addresses the above two problems. I consider a family of estimators and established conditions under which their mean squared consistency exist. The results have a direct application in bandwidth selection and also suggests a bandwidth proportional to $n^{1/3}$. The proportional constant can be obtained based on the proof of mean squared consistency. I further suggest to approach the proportional constant with a pilot estimate. The suggested bandwidth shows superior performances compared with $n^{1/3}$ or $n^{1/2}$. The above results are established under multivariate setting which not only covers the long-standing univariate bandwidth selection problem, but also brings up the multivariate question with a solution.

To tackle the computational problem in multivariate setting, I propose a family of new estimators and prove strong consistency of these estimators. The new estimators are fast to compute and have comparable performances to spectral variance estimators with slightly inflated variance.