Abstract. The performance of the Weather Research and Forecasting regional model with chemistry (WRF-Chem) in simulating the spatial and temporal variations in aerosol mass, composition, and size over California is quantified using measurements collected during the California Nexus of Air Quality and Climate Experiment (CalNex) and the Carbonaceous Aerosol and Radiative Effects Study (CARES) conducted during May and June of 2010. The extensive meteorological, trace gas, and aerosol measurements collected at surface sites and along aircraft and ship transects during CalNex and CARES were combined with operational monitoring network measurements to create a single dataset that was used to evaluate the one configuration of the model. Simulations were performed that examined the sensitivity of regional variations in aerosol concentrations to anthropogenic emissions and to long-range transport of aerosols into the domain obtained from a global model. The configuration of WRF-Chem used in this study is shown to reproduce the overall synoptic conditions, thermally-driven circulations, and boundary layer structure observed in region that controls the transport and mixing of trace gases and aerosols. However, sub-grid scale variability in the meteorology and emissions as well as uncertainties in the treatment of secondary organic aerosol chemistry likely contribute to errors at a primary surface sampling site located at the edge of the Los Angeles basin. Differences among the sensitivity simulations demonstrate that the aerosol layers over the central valley detected by lidar measurements likely resulted from lofting and recirculation of local anthropogenic emissions along the Sierra Nevada. Reducing the default emissions inventory by 50% led to an overall improvement in many simulated trace gases and black carbon aerosol at most sites and along most aircraft flight paths; however, simulated organic aerosol was closer to observed when there were no adjustments to the primary organic aerosol emissions. The model performance for some aerosol species was not uniform over the region, and we found that sulfate was better simulated over northern California whereas nitrate was better simulated over southern California. While the overall spatial and temporal variability of aerosols and their precursors were simulated reasonably well, we show cases where the local transport of some aerosol plumes were either too slow or too fast, which adversely affects the statistics regarding the differences between observed and simulated quantities. Comparisons with lidar and in-situ measurements indicate that long-range transport of aerosols from the global model was likely too high in the free troposphere even though their concentrations were relatively low. This bias led to an over-prediction in aerosol optical depth by as much as a factor of two that offset the under-predictions of boundary-layer extinction resulting primarily from local emissions. Lowering the boundary conditions of aerosol concentrations by 50% greatly reduced the bias in simulated aerosol optical depth for all regions of California. This study shows that quantifying regional-scale variations in aerosol radiative forcing and determining the relative role of emissions from local and distant sources is challenging during "clean" conditions and that a wide array of measurements are needed to ensure model predictions are correct for the right reasons. In this regard, the combined CalNex and CARES datasets are an ideal testbed that can be used to evaluate aerosol models in great detail and develop improved treatments for aerosol processes.