Reducing time and space costs of memory tracing
- Author(s): Gao, Xiaofeng
- et al.
Event tracing of applications under dynamic execution is crucial for performance modeling, optimization and trace- driven simulations. However, collecting and processing events, especially memory addresses, is extremely expensive in terms of time and space requirements. It is also challenging to find the right platform and the right tools to perform tracing. Such challenges greatly hinder the feasibility of tracing memory of large, long running, parallel applications. In this thesis, the challenges in tracing memory are explored and several solutions are exhibited to face each challenge. The philosophy of these solutions, schemes and workarounds is to find balance in the time and space on available platforms with available tools. Specifically, the time required to acquire memory traces can be greatly reduced by carefully identifying all causes of slowdown and addressing them in the design of built-for-the-purpose tracers. Techniques, including buffering, chaining and delayed instrumentation, are introduced and have been shown to reduce the time cost of memory tracing by more than 80% when used with traditional instrumentation tools. In addition, a lightweight instrumentation tool ALITER, which only causes two-fold slowdown in collecting full memory traces, is introduced to demonstrate the benefits of asynchronous tracing schemes. Path grammar guided trace compression and trace approximation are explored in this thesis to reduce space costs of memory tracing. The efficacy of low-level general purpose compression schemes is greatly enhanced when they are organized around information about program structure and phases; combined with trace recording designed to capture the locality properties of random events, not the exact random events themselves, space required to store traces can be reduced by many orders of magnitude. These techniques enable one to generate reusable intermediate representations of memory traces, which are small enough to be stored on the disk, accurate enough for trace-driven simulations and fast enough to collect and process. Together with working through practical issues of availability, these advances make trace-driven simulation of full scale HPC applications feasible