Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Information Asymmetries in Data-Driven and Sustainable Operations: Stochastic Models and Adaptive Algorithms for Strategic Agents

Abstract

The modern landscape of operations management (OM) has undergone a profound paradigm shift driven by two surging forces: 1) the integration of expansive real-time data inflow, and 2) the recognition of ambiguity in navigating operational disruptions due to climate crisis. In this transition to data-driven and sustainable operations, a fundamental challenge lies in isolating the lack of transparency in collaboration willingness and misaligned economic motives of strategic agents (i.e., stakeholders) in socio-technical systems.

Motivated by contributing to this breakthrough, this dissertation establishes a foundational theory that leverages data-driven decision-making to proactively mitigate intricate uncertainties, arising from imperfect model insights and information asymmetries, hindering sustainable OM. The dissertation begins by exploring nonlinear and non-stationary control systems under imperfect knowledge of the reward function and system dynamics—a nontrivial scenario common in applications like balancing occupant comfort and energy efficiency in buildings. Expanding on this rigorous control-theoretic learning analysis, the majority of the dissertation is devoted to devising novel, data-driven, and adaptive incentive frameworks to tackle unexplored information disparities in the context of repeated principal-agent games.

Inspired by several real-world applications, such as forest conservation incentives in Payment for Ecosystem Services and renewable energy aggregator contracts for utility grids, this dissertation introduces the “hidden agent rewards” model within a multi-armed bandit framework, where: a principal learns to proactively lead an agent's choices by sequentially offering menus of incentives which contribute to the agent's hidden rewards for a finite set of arms. Designing policies in this setting is challenging, because it entails analyzing dynamic externalities imposed by two separate learning algorithms trained in parallel by strategic parties. To the best of our knowledge, this dissertation presents i) the first generic stochastic sequential model for this widely applicable information imbalance context, and ii) the first methodological framework that contends with the principal’s trade-off between consistently learning the agent’s rewards and maximizing their own rewards through adaptive incentives. We examine two scenarios: one where the agent has perfect knowledge of their reward model and another where the agent learns their model over time, potentially leading to misleading choices for the principal. In both cases, solid statistical consistency and regret guarantees are proven to persist without restricting the agent’s algorithm or reward distributions. Throughout the dissertation, these theoretical results, along with versatile practical insights, outline a prosperous future research landscape to enhance various incentive practices in OM confronting the hidden objectives of incentivized agents.