The complexity of modern computer systems makes performance modeling an invaluable resource for guiding crucial decisions such as workload management, configuration management, and resource provisioning. With continually evolving systems, it is difficult to obtain ground truth about system behavior. Moreover, system management policies must adapt to changes in workload and configuration to continue making efficient decisions. Thus, we require data-driven modeling techniques that auto-extract relationships between a system's input workload, its configuration parameters, and consequent performance.
This dissertation argues that statistical machine learning (SML) techniques are a powerful asset to system performance modeling. We present an SML-based methodology that extracts correlations between a workload's pre-execution characteristics or configuration parameters, and post-execution performance observations. We leverage these correlations for performance prediction and optimization.
We present three success stories that validate the usefulness of our methodology on
storage and compute based parallel systems. In all three scenarios, we outperform state-
of-the-art alternatives. Our results strongly suggest the use of SML-based performance
modeling to improve the quality of system management decisions.