Search

Article
Peer Reviewed

As the complexity of processors increases, it becomes harder for designers to understand the non-trivial and many times non-intuitive interactions among the micro-architecture internal structures. Understanding these interactions is important because it helps pinpoint bottlenecks, enabling designers to reason about sources of performance loss and improve their next generation of processors. To help designers understand these interactions in current and, more importantly, in future generation designs, designers make heavy use of computer architecture detailed simulation. These simulators model the behavior of the processor on a per-cycle basis, allowing designers to look at very detailed trade-offs. Building and maintaining these simulators is a large and complicated task. In addition, recent trends in designing micro-architectures with multiple cores in the same chip brings new challenges that affect the way simulation results should be compared. This dissertation focuses on techniques to help build and maintain simulators, as well as techniques to improve the way architects evaluate design choices using simulation.

Existing user-level simulators require manual hand coding for the emulation of each and every possible system effect (e.g., system call, interrupt, DMA transfer) that can impact the application.s execution. Developing such an emulator for a given operating system is a tedious exercise, and it can also be costly to maintain it to support newer versions of that operating system. Furthermore, porting the emulator to a completely different operating system might involve building it all together from scratch. The first contribution of this dissertation is a technique to automatically capture the system effects to an application. The system effects are captured in logs and then used to guide achitecture simulation. By using the proposed technique, the complexity of implementing and maintaining user-level simulators is greatly reduced. In addition, the technique guarantees deterministic simulation on uni-processor systems.

As multi-core processors become main stream, techniques to address efficient simulation of multi-threaded workloads are needed. Simulation of multithreaded workloads on multi-core systems suffer from non-determinism across runs in different architecture configurations. If the execution paths between two simulation runs of the same benchmark, with the same input, are too different, the simulation results cannot be used to compare the configurations. The other contributions of this dissertation focus on techniques to efficiently collect simulation checkpoints for multi-threaded workloads. It extends the previous technique to efficiently collect logs for uni-processor simulation. Using these checkpoints, multi-threaded simulation in multi-core systems becomes deterministic. The deterministic simulation results in stalls that would not naturally occur in execution. This dissertation proposes techniques that allow one to accurately compare performance across architecture configurations in the presence of these stalls.

Pre-2018 CSE ID: CS2007-0907

Thesis
Peer Reviewed

Reproducible user-level simulation of multi-threaded workloads

Pereira, Cristiano

UC San Diego Electronic Theses and Dissertations (2007)

As the complexity of processors increases, it becomes harder for designers to understand the non-trivial and many times non-intuitive interactions among the micro- architecture internal structures. Understanding these interactions is important because it helps pinpoint bottlenecks, enabling designers to reason about sources of performance loss and improve their next generation of processors. To help designers understand these interactions in current and, more importantly, in future generation designs, designers make heavy use of computer architecture detailed simulation. These simulators model the behavior of the processor on a per-cycle basis, allowing designers to look at very detailed trade-offs. Building and maintaining these simulators is a large and complicated task. In addition, recent trends in designing micro-architectures with multiple cores in the same chip brings new challenges that affect the way simulation results should be compared. This dissertation focuses on techniques to help build and maintain simulators, as well as techniques to improve the way architects evaluate design choices using simulation. Existing user-level simulators require manual hand coding for the emulation of each and every possible system effect (e.g., system call, interrupt, DMA transfer) that can impact the application's execution. Developing such an emulator for a given operating system is a tedious exercise, and it can also be costly to maintain it to support newer versions of that operating system. Furthermore, porting the emulator to a completely different operating system might involve building it all together from scratch. The first contribution of this dissertation is a technique to automatically capture the system effects to an application. The system effects are captured in logs and then used to guide architecture simulation. By using the proposed technique, the complexity of implementing and maintaining user-level simulators is greatly reduced. In addition, the technique guarantees deterministic simulation on uni-processor systems. As multi-core processors become main stream, techniques to address efficient simulation of multi-threaded workloads are needed. Simulation of multithreaded workloads on multi- core systems suffer from non-determinism across runs in different architecture configurations. If the execution paths between two simulation runs of the same benchmark, with the same input, are too different, the simulation results cannot be used to compare the configurations. The other contributions of this dissertation focus on techniques to efficiently collect simulation checkpoints for multi-threaded workloads. It extends the previous technique to efficiently collect logs for uni-processor simulation. Using these checkpoints, multi-threaded simulation in multi-core systems becomes deterministic. The deterministic simulation results in stalls that would not naturally occur in execution. This dissertation proposes techniques that allow one to accurately compare performance across architecture configurations in the presence of these stalls

Article
Peer Reviewed

Using Program Phases as Meta-Data for Runtime Energy Optimization

Technical Reports (2004)

Power consumption is a major concern in embedded systems design due to the portability and battery driven operation of such systems. The runtime optimization of embedded software pplications for system-level power / performance tradeoffs requires ability of the runtime system to probe system and application status and utilize procedures that make these tradeoffs effective. To ensure efficiency of decision making, it is important that such decisions are made with the least overhead to system power. One way to achieve this capability is through systematic definition, and update of meta data that can be probed by the runtime system and given as input to the dynamic power management algorithms. In this paper, we use the concept of application reflection, a technique in which a program represents its own structure and behavior through the use of meta-data. Its use enables the ability of the runtime system to look at the program representation and make power management related decisions. We present a profiling scheme to build a reflexive data structure in which a program represents its own execution behavior, and use this information at run time to guide operating system power management decisions. Our scheme is inspired on {\it Simpoint}, a tool for automatic program phase classification and simulation points selection. We use main memory bank shutdown as an example of how our technique can be used and we show that we can achieve energy/delay savings comparable to the best known hardware based technique. We believe that our approach can also be used for efficient energy management of other resources such as processor and system peripherals.

Pre-2018 CSE ID: CS2004-0797

Cover page: Using Program Phases as Meta-Data for Runtime Energy Optimization

Article
Peer Reviewed

Efficient Hardware Support for Deterministic Replay Debugging of Memory Races, Interrupts and Self Modifying Code

Technical Reports (2005)

Significant time is spent by companies trying to reproduce and fix bugs. BugNet is a recent architecture proposal that provides architecture support for debugging. It focuses on continuously recording information about the program execution which can be communicated back to the developer on encountering an abrupt program termination. Using that information the developer can deterministically replay the program execution and can reproduce and fix the bugs. To enable deterministic replay for multi-threaded programs BugNet assumed hardware support to record the ordering between the memory operations executed across all the threads. In this paper, we significantly reduce the hardware support required for logging multi-threaded programs by exploiting an important property in BugNet checkpointing scheme that allows one to independently replay each thread without explicitly logging shared memory dependences. During offline debugging, we independently replay each thread to collect memory traces for their execution. Given these traces, we show how one can infer the memory ordering across the threads. In addition, we present a set of optimizations to improve the complexity and functionality of the BugNet architecture. Those optimizations include supporting self-modifying code and reducing the log size in the presence of frequent interrupts.

Pre-2018 CSE ID: CS2005-0843

Cover page: Efficient Hardware Support for Deterministic Replay Debugging of Memory
Races, Interrupts and Self Modifying Code

Article
Peer Reviewed

A software architecture for building power aware real time operating systems

ICS Technical Reports (2002)

As computing moves to battery operated portable systems, the functionality is increasingly implemented in software with an embedded/real-time operating system (RTOS). For such systems, there is a need for power-aware applications and system software. In this paper, we present a layered software architecture that enables the application and OS programmers to design energy-efficient applications and RTOS services. The software architecture consists of a power-aware RTOS kernel and a set of standard software interfaces that enable easy exchange of timing and power information among the underlying hardware platform, the RTOS, and the applications. To demonstrate the utility of our approach we focus on making the task scheduling process in an RTOS power-aware, and incorporate an OS-directed dynamic power management technique that enables adaptive power-fidelity tradeoffs during task scheduling. We have implemented it using the RedHat eCos operating system running on a complete variable voltage system based on the Intel XScale micro-architecture. We ran four different algorithms, from a simple shutdown based scheme to a dynamic predictive and adaptive DVS algorithm. The results show an energy gain of up to 66% when comparing to the execution without any power management incorporated and 17% comparing to the simple shutdown scheme.

Cover page: A software architecture for building power aware real time operating systems

Article
Peer Reviewed

Automatic Logging of Operating System Effects to Simplify Application-Level Architecture Simulation

Technical Reports (2005)

Modern architecture research relies heavily on detailed pipeline simulation. A time consuming part of building a simulator is correctly emulating the operating system effects when performing application-level simulation. This requires hand coding the emulation of any system effect (e.g., system call, interrupts, DMA transfers) used by the workload. This can be tedious, especially when having to support running the simulator on completely different operating systems or different versions of the operating system. Once a simulator has correctly emulated the system effects, it can also be costly to maintain this emulation, since the emulation can break with new versions of the operating system the simulator is running on. In addition, system effect emulation can cause some variance in simulation results. In this paper we describe an approach that automatically logs operating system effects to guide architecture simulation of user code. The benefit of this approach is that (a) we do not have to build or support any infrastructure for emulating operating system effects, and (b) the system effect logs provide deterministic simulation in the presence of these system effects.

Pre-2018 CSE ID: CS2005-0840

Cover page: Automatic Logging of Operating System Effects to Simplify
Application-Level Architecture Simulation

Scholarly Works (6 results)

Reproducible User-Level Simulation of Multi-Threaded Workloads

Reproducible user-level simulation of multi-threaded workloads

Using Program Phases as Meta-Data for Runtime Energy Optimization

Efficient Hardware Support for Deterministic Replay Debugging of Memory Races, Interrupts and Self Modifying Code

A software architecture for building power aware real time operating systems

Automatic Logging of Operating System Effects to Simplify Application-Level Architecture Simulation