This paper studies distributed Q-learning for Linear Quadratic Regulator
(LQR) in a multi-agent network. The existing results often assume that agents
can observe the global system state, which may be infeasible in large-scale
systems due to privacy concerns or communication constraints. In this work, we
consider a setting with unknown system models and no centralized coordinator.
We devise a state tracking (ST) based Q-learning algorithm to design optimal
controllers for agents. Specifically, we assume that agents maintain local
estimates of the global state based on their local information and
communications with neighbors. At each step, every agent updates its local
global state estimation, based on which it solves an approximate Q-factor
locally through policy iteration. Assuming decaying injected excitation noise
during the policy evaluation, we prove that the local estimation converges to
the true global state, and establish the convergence of the proposed
distributed ST-based Q-learning algorithm. The experimental studies corroborate
our theoretical results by showing that our proposed method achieves comparable
performance with the centralized case.