Performance analysis is a critical aspect of CPU design, but it has become more difficult during the past decade as physical constraints limit improvements in single-threaded performance. This dissertation analyzes three interrelated problems associated with effective performance analysis. First, high-level microarchitecture simulation is orders of magnitude slower than native execution. I propose a novel statistical sampling technique called LiveSim that dramatically reduces simulation time. Second, multithreaded benchmarks may use input sets that produce misleading results. I demonstrate, for the first time, the true scalability of the PARSEC benchmark suite using real multiprocessor systems, and show how to accurately evaluate the performance of simulated multiprocessor systems. Third, modern software is often written in dynamic languages, and the interaction of a JIT compiler and different CPU microarchitectures can be difficult to analyze using simulation. I modified the V8 JavaScript engine and ran benchmarks using real systems to attain statistically sound novel insights that would be difficult to attain using simulation alone. These three solutions demonstrate ways to perform effective and statistically valid microarchitectural performance analysis.