Improvements in performance and energy efficiency often require deep understanding of the complex interactions between hardware and software components. Hardware performance events provide low-level insights about the behavior of a program in real hardware while imposing negligible overhead. Such low overhead allows real-time profile in production environments and makes them useful tools for Feedback Directed Optimization, Software and Hardware Validation, Security, and Performance Analysis among many other applications.
When performance counters are sampled, the perturbation imposed on the monitored task increases with the sampling frequency; restricting the maximum number of sample points that can be acquired in a single run to a few hundreds of sample points per second of execution. This limited sampling frequency, together with the non-deterministic nature of many performance events and the systematic and random errors in the number of measured events caused by bugs and design limitations, had restricted the utilization of performance events in applications that require fine granularity and precision. The error and non-determinism of performance events makes imperative to develop solid methodologies to analyze the data collected from performance counters.
This dissertation presents what we believe is the first formal treatment of methodologies aimed to increase the precision of estimates of performance events in two scenarios: (1) The first scenario is estimation of a trend performance trace; the problem of estimating the cumulative number of performance events that have occurred at each moment during the execution of a task. We propose the utilization of a regression model that combines sample points from different executions and enforces monotonicity constraints in order to increase precision and granularity of the estimator. Additionally, we compare multiple approaches to interpolation of unobserved points in the trace. We present experimental verification of the superiority of the proposed methodology. (2) The second scenario is the attribution of performance events to dynamic instructions. We propose a methodology to build a regression model that associates performance events with dynamic instructions, rather than static ones, allowing to separately estimate performance events for different instances of the same static instructions, ie. first iteration of a loop versus others. We demonstrate that the presented methodology provide more accurate estimates than the traditional approach, while requiring much less sample points.
Finally, we study the structure of the regression problems created in each scenario and show two particular cases that can be solved in linear time and space, either by mapping the problem into a total order Isotonic Regression problem, solvable in linear time with the well-known Pool of Adjacent Violators algorithm, or using a new dynamic algorithm, linear time algorithm proposed in this dissertation. In both cases, the linear time complexity is a significant improvement over the traditional Non-negative Least Squares, Linear or Quadratic Programming utilized to solve the general case.