Despite several progresses of control-theoretic techniques in the past decade, these methods still struggle to bridge the widening gap between theory and reality, which is exacerbated by the increasing complexity, uncertainty, and safety requirements. Consequently, the creation of online control algorithms for safety-critical applications in non-stationary environments could pave the way for a new chapter in modern control theory, substantially enhancing the reliability of intelligent systems as they function in dynamic, uncertain, and potentially hostile conditions subject to physical and computational limitations. Safe non-stationary decision-making not only encompasses the core challenges of traditional decision-making but also presents new hurdles, such as (i) fast adaptation under the non-stationary environments, (ii) global optimality convergence of the non-convex optimization, (iii) continual balancing of objective and constraints. The above challenges go beyond current capabilities in computation and theory and manifest in various aspects of practical and theoretical interests, from sample complexity and non-convergence issues to computational tractability and enforcement of safety constraints for real-time control. This thesis aims to pioneer system operation at the nexus of reinforcement learning, online learning, statistical learning, and nonlinear optimization. The design of provably efficient and safe online decision-making algorithms that exploit prediction and prior knowledge while grappling with the effects of dynamic feedback and non-stationary environment will push the frontiers of computational verification and synthesis of control policies for safety-critical systems.
To overcome these challenges and realize the full potential of online decision-making approaches for adaptability and performance gains, this thesis aims to extend the foundational knowledge in systems and control and broaden our understanding of performance limits and engineering trade-offs when the system must operate outside of the assumptions of known models and needs to adapt to its environment in real-time. In particular, we develop a new mathematical foundation and a set of computational tools for the design of safe online decision-making algorithms that can be deployed in environments that undergo changes. Along this line, we will address the following objectives: (i) escaping spurious local minimum trajectories in online time-varying non-convex optimization, (ii) provably efficient primal-dual reinforcement learning for CMDPs with non-stationary objectives and constraints, (iii) non-stationary risk-sensitive reinforcement learning with near-optimal dynamic regret, adaptive detection, and separation design.