Formaldehyde column density measurements as a suitable pathway to estimate near‐surface ozone tendencies from space

In support of future satellite missions that aim to address the current shortcomings in measuring air quality from space, NASA's Deriving Information on Surface Conditions from Column and Vertically Resolved Observations Relevant to Air Quality (DISCOVER‐AQ) field campaign was designed to enable exploration of relationships between column measurements of trace species relevant to air quality at high spatial and temporal resolution. In the DISCOVER‐AQ data set, a modest correlation (r2 = 0.45) between ozone (O3) and formaldehyde (CH2O) column densities was observed. Further analysis revealed regional variability in the O3‐CH2O relationship, with Maryland having a strong relationship when data were viewed temporally and Houston having a strong relationship when data were viewed spatially. These differences in regional behavior are attributed to differences in volatile organic compound (VOC) emissions. In Maryland, biogenic VOCs were responsible for ~28% of CH2O formation within the boundary layer column, causing CH2O to, in general, increase monotonically throughout the day. In Houston, persistent anthropogenic emissions dominated the local hydrocarbon environment, and no discernable diurnal trend in CH2O was observed. Box model simulations suggested that ambient CH2O mixing ratios have a weak diurnal trend (±20% throughout the day) due to photochemical effects, and that larger diurnal trends are associated with changes in hydrocarbon precursors. Finally, mathematical relationships were developed from first principles and were able to replicate the different behaviors seen in Maryland and Houston. While studies would be necessary to validate these results and determine the regional applicability of the O3‐CH2O relationship, the results presented here provide compelling insight into the ability of future satellite missions to aid in monitoring near‐surface air quality.


Introduction
Great improvements in air quality have been made throughout the U.S. over the past few decades, but progress has begun to plateau, leaving many regions with criteria pollutant levels that regularly exceed the national ambient air quality standards (NAAQS). In addition, the U.S. Environmental Protection Agency has recently lowered the exceedance level for ozone (O 3 ) from 75 parts per billion by volume (ppbv) to 70 ppbv over an 8 h period [U.S. Environmental Protection Agency, 2015]. As a result, some regions that were previously in compliance with the NAAQS may now be considered nonattainment areas, and some previous nonattainment areas are expected to be re-classified with a worse ranking [Downey et al., 2015]. Some of the difficulty in making further improvements in air quality can be attributed to increases in background levels of criteria pollutants and their chemical precursors [Hudman et al., 2004;Oltmans et al., 2006;Lin et al., 2012a;Simpson et al., 2012;Cooper et al., 2014]. That is, long-and medium-range transport (intercontinental and interregional transport) of air masses may convolute our understanding of local air quality issues in nonattainment areas, thereby making it increasingly more challenging for scattered networks of surface monitoring stations to accurately depict regional air quality [Paoletti et al., 2014;Zhang et al., 2014;Zoogman et al., 2014]. While satellite-based measurements of criteria pollutants could theoretically help fill in the gaps between surface monitoring sites, extracting near-surface concentrations of pollutants from column density measurements (such as those that are provided from satellites) has proven difficult [Reed et al., 2013;Flynn et al., 2014]. In this work, data collected during NASA's Deriving Information on Surface Conditions from Column and Vertically Resolved Observations Relevant to Air Quality (DISCOVER-AQ) field campaign are used to explore the possibility of using space-based measurements of formaldehyde (CH 2 O) as a means to estimate near-surface photochemical conditions and understand its relationship with O 3 . Given the difficulty in detecting near-surface O 3 directly from space, any corroborating information on O 3 behavior and related exposure to poor air quality would raise the confidence in satellite information and its impact on decision making.
There are many challenges to using space-based platforms to probe surface air quality. Most notably, the vertical distributions of trace species provide inherent challenges in extracting near-surface measurements (and thus, relevant measurements for human exposure) from the total atmospheric columns measured by satellites [Martin, 2008;Ichoku et al., 2012;National Science and Technology Council, 2013;Streets et al., 2013;Duncan et al., 2014]. Because the majority (~90%) of column O 3 is contained in the stratosphere, much research has focused on developing satellite retrieval algorithms that separate the tropospheric O 3 column from the stratospheric O 3 column [Fishman et al., 2008]. However, under typical midlatitude conditions, O 3 in the planetary boundary layer (PBL, the mixed layer that extends 1-3 km above Earth's surface) can contribute anywhere from 10 to 50% to the total tropospheric column, and substantial variability in upper tropospheric O 3 can make it difficult to attribute changes in the total tropospheric column to changes in PBL O 3 Thompson et al., 2014]. Model simulations have suggested that multispectral retrieval techniques have the potential to derive a lower tropospheric or boundary layer O 3 column density, but these techniques cannot be implemented by using the current constellation of air quality-relevant satellites Zoogman et al., 2011;Hache et al., 2014].
Nitrogen dioxide (NO 2 ) plays an important role in tropospheric O 3 formation and is also routinely measured by satellite-based platforms. Column NO 2 also has a strong stratospheric contribution; however, its vertical profile is minimized in the upper troposphere and is heavily weighted toward the surface. Over polluted areas, near-surface NO 2 dominates the spatial variability in total column abundance. NO 2 also has substantial horizontal variability on subgrid scales (i.e., smaller than satellite horizontal resolution), meaning that satellite measurements cannot fully resolve valuable details of the true NO 2 distribution. This also complicates validation efforts, as measurements from a discrete ground site are often inadequate to represent the average NO 2 abundance over a satellite footprint. In contrast to O 3 and NO 2 , CH 2 O has negligible presence in the stratosphere and a tropospheric vertical profile that is heavily weighted toward Earth's surface. In comparison to NO 2 , it tends to have relatively low horizontal variability on subgrid scales owing to its secondary source from hydrocarbon oxidation, which in contrast to highly localized point sources (i.e., NO 2 emissions) tend to be spatially broader [Fried et al., 2008;Junkermann, 2009;Apel et al., 2012;Baidar et al., 2013]. For this reason, satellite-based measurements of CH 2 O may better represent actual conditions at the surface and be more easily interpreted to diagnose air quality conditions from space.
Previous work using satellite-based measurements of CH 2 O have made inferences about near-surface hydrocarbon conditions and have been useful in deriving emission estimates for biogenic volatile organic compounds (VOCs) over forested regions [Millet et al., 2008;Kefauver et al., 2014] and highly reactive VOCs over urban areas [Zhu et al., 2014]. In other work, co-located satellite measurements of CH 2 O and NO 2 have been used to infer the regional sensitivity of tropospheric O 3 formation to changes in NO x (NO x ≡ NO + NO 2 ) or VOC reactivity (that is, whether regional O 3 production is NO x or VOC-limited) [Martin et al., 2004;Duncan et al., 2010] and have proven useful for investigating near-surface photochemical environments in understudied regions of the world [Jin and Holloway, 2015;Mahajan et al., 2015]. While the CH 2 O/NO 2 ratios described in these studies may be useful for observing near-surface photochemical environments, they do not provide any information about the distribution of near-surface O 3 and therefore cannot be used to determine which regions would receive the greatest benefit from further regulation. Previous studies have noted a correlation between in situ measurements of O 3 and CH 2 O but did not explore these behaviors in the vertical dimension and did not explore the possibility of using this correlation as a means to estimate near-surface O 3 tendencies from space [Wert, 2003;Parrish et al., 2012]. Additional shortcomings of current satellite-based techniques arise from the nature of the satellite platforms themselves. Observations from polar-orbiting satellites are limited in their temporal resolution. For example, the Ozone Monitoring Instrument (OMI), housed aboard NASA's AURA satellite, only collects measurements over a given region of Earth once per day (usually around solar noon). While this is useful for looking at long-term averages of trace species, it is not useful for looking at the diurnal variations that may be important for understanding short-lived (i.e., a few days or less) air pollution events. Combining information from two satellites (e.g., OMI and Scanning Imaging Absorption Spectrometer for Atmospheric Chartography or OMI and Global Ozone Monitoring Experiment-2) with different local overpass times have been used to assess diurnal variability of select trace gases, but the limited information from these two views still misses the late afternoon when air quality impacts are often most severe [Boersma et al., , 2009De Smedt et al., 2015].
A new collection of air quality satellites is on the horizon that will address the problem of temporal resolution. Making observations from geostationary orbit, these satellites will also deliver improved spatial resolution over key areas of the northern hemisphere. These include the Tropospheric Emissions: Monitoring of Pollution (TEMPO) mission over North America (http://science.nasa.gov/missions/tempo/), Sentinel-4 over Europe and North Africa (http://www.esa.int/Our_Activities/Observing_the_Earth/ Copernicus/Sentinels_-4_-5_and_-5P), and the Geostationary Environment Monitoring Spectrometer over East Asia (http://www.ballaerospace.com/page.jsp?page=319). The geostationary orbit of TEMPO, for example, allows for a temporal resolution of~1 h for O 3 and NO 2 and~3 h for CH 2 O. Additionally, new multispectral retrieval algorithms are being optimized for these satellites to obtain information on boundary layer O 3 that would greatly improve our ability to monitor air pollution from space [Zoogman et al., 2016]. While this multispectral technique is promising, additional indicators of near-surface O 3 behavior would be invaluable for validating the accuracy and robustness of techniques that directly retrieve near-surface O 3 from satellites.
In anticipation of these geostationary air quality observations, NASA conducted a series of field studies to examine the relationship between surface air quality and the vertical distribution of pollutants as they would be observed from space. The overall project, called DISCOVER-AQ, conducted four field studies in different parts of the U.S. (Maryland, California, Texas, and Colorado) at four different times (July 2011, January 2013, September 2013, and August 2014. The DISCOVER-AQ data set enables the exploration of relationships between column amounts of O 3 , NO 2 , and CH 2 O with very high spatial (both vertical and horizontal) and temporal (typically three measurements per day at each location) resolution, which effectively affords us a preview of the potential utility and applicability of future geostationary satellite missions. In the work presented below, aircraft data are used to investigate the relationship between O 3 and CH 2 O, with an underlying goal of understanding whether column measurements of CH 2 O might be useful for diagnosing near-surface O 3 behavior.

The Chemical Link Between O 3 and CH 2 O
The impetus for examining the relationship between O 3 and CH 2 O comes from their intertwined chemical origins. Both are secondary photochemical products influenced by human emissions, with some intriguing examples of correlative behavior in previous literature examining their in situ behavior in Houston [Wert, 2003;Parrish et al., 2012]. While CH 2 O does have a small primary source from combustion, ambient CH 2 O mixing ratios are dominated by secondary production, which occurs via the photochemical oxidation of hydrocarbons. The reaction between hydrocarbons (represented as RH) and the hydroxyl radical (OH) forms organic peroxy radicals (RO 2 ), shown in reaction (R1). In the presence of oxides of nitrogen (NO x = NO + NO 2 ), RO 2 radicals are converted to alkoxy radicals (RO; reaction (R2)). In the absence of NO x , conversion of RO 2 to RO occurs via alternate pathways (reaction (R3)).
Journal of Geophysical Research: Atmospheres 10.1002/2016JD025419 RO radicals can then go on to form CH 2 O and an additional radical (in this case, the peroxide radical is shown in reaction (R4)): Because k 4 is typically about 3 orders of magnitude faster than k 2 and k 3 , chemical formation of CH 2 O can effectively be written as In reaction (R5), α represents a branching ratio-that is, the number of CH 2 O molecules produced per RO 2 radical. The branching ratio α is different for each VOC and depends on the conversion pathway (i.e., by reaction with NO, HO 2 , etc.) and is often used to represent the bulk branching ratio for air masses. Loss of CH 2 O occurs by photolysis and by reaction with OH.
In addition, NO x also facilitates cycling between HO 2 and OH, effectively speeding up the processes described above: The formation of NO 2 is also a crucial step in the catalytic formation of O 3 . O 3 has a relatively long lifetime (around 1 week in the troposphere) and tends to accumulate Wolfe et al. [2016] found that over a NO x range of 0.1-2 ppbv, the bulk branching ratio, α, ranged from 0.43 in low NO x environments to 0.62 in high NO x environments. They also found that over this range of NO x conditions, CH 2 O formation increased by a factor of 3 and RO 2 formation increased by a factor of 2. Therefore, they concluded that the effect that NO x imparts on the conversion of HO 2 to OH (i.e., reaction (R6) was the primary driver of the observed relationship between CH 2 O and NO x , and the dependence of α on NO x plays a lesser (but still important) role.
Because reaction (R1) happens at a much slower rate (orders of magnitude) than reactions (R2)-(R5), the chemical formation of CH 2 O is limited by VOC reactivity (VOCR) and the availability of OH. VOCR is calculated as the sum of the product of the rate constant for the reaction between each VOC with OH (k i,OH ) multiplied by its concentration [VOC] i : Valin et al. [2015] found that because the lifetime of CH 2 O is so short at midday (~1-3 h), ambient CH 2 O concentrations reflect the rate of hydrocarbon oxidation in an air mass. That is, ambient CH 2 O concentrations are in rapid equilibrium with the local oxidative environment, and CH 2 O concentrations will rapidly rise and fall with changes in the rate of hydrocarbon oxidation (i.e., the rate of reaction (R1)-this could be due to changes in the hydrocarbon composition or changes in the oxidative capacity, or both).
Because NO 2 formation (and ensuing O 3 formation) proceeds via the same initial reaction as CH 2 O formation, there is a natural link between CH 2 O and O 3 . That is, environments with high rates of O 3 formation will tend to have relatively high ambient concentrations of CH 2 O, and vice versa. However, because of the large disparity in the lifetimes of these two species, they behave differently in the atmosphere and have different time dependencies. In effect, changes in ambient CH 2 O are prompted by changes in the local oxidative environment, while O 3 tends to accumulate due to photochemical production throughout the day. "moderate," and much of California's San Joaquin Valley is listed as "extreme" at the time of this publication). During each deployment, surface measurements were augmented at 6-8 ground sites in the local air quality monitoring network administered by state and local environmental agencies. These sites were situated in the heart of urban areas, in nearby suburban areas, and outlying rural areas. Coordinated aircraft sampling was undertaken to establish vertical distributions of gases above each ground site. Colorado had the highest number of research flights flown (16), and Houston the least (9) with California (10) and Maryland (14) falling in between. The NASA P-3B flew spirals over each ground site, with altitudes ranging from~300 to 4000 m above ground level, and occasional missed approaches where the P-3B descended to~30 m. On a typical research flight the P-3B would fly three sorties, spiraling over each site 3 times (typically in the morning, around midday, and in the afternoon). Typical flight tracks and spiral locations for each phase of DISCOVER-AQ are shown in Figure 1. Because the primary objective of DISCOVER-AQ was to relate column measurements to surface conditions on air quality relevant days, flights were typically conducted under clear-sky or partly cloudy conditions. Airborne measurements of species that are related to satellite-based platforms and are directly related to this work include O 3 , NO 2 , and CH 2 O. Measurements of other species (water vapor, carbon monoxide (CO), methane (CH 4 ), carbon dioxide (CO 2 ), nitric oxide (NO), and volatile organic compounds (VOCs)) were used to further aid in analysis. All P-3B measurements used in this work are summarized in Table 1. The only surface-based measurements used in this work were O 3 mixing ratios, which were typically provided by local air quality management districts (i.e., the Maryland Department of the Environment, the Texas Commission on Environmental Quality, the Colorado Department of Public Health and Environment, the California Air Resources Board, and the San Joaquin Valley Air Pollution Control District) and were supplemented with measurements from collaborators when needed. Column densities of O 3 and CH 2 O were calculated by integrating measurements collected on the P-3B over the altitude range of each spiral. Because the maximum altitude reached during each spiral varied, all spirals were integrated over a standard altitude range (0.3 to 3.2 km above ground level). For spirals where the maximum altitude terminated below the standard range, the remaining portion was calculated by extrapolating the measurement from the top of the spiral up to 3.2 km. Excluding the Beltsville spirals (which had an altitude restriction of 2 km, and therefore are not used in this work), 22% of the remaining DISCOVER-AQ spirals required extrapolation up to the standard range, and the average range of extrapolation was 30 m (or about 1% of the total spiral range).
All spirals were extrapolated downward by using mixing ratios measured at the bottom of each spiral (usuallỹ 300 m above ground level) and assuming a constant mixing ratio down to the surface. Flynn et al. [2014] referred to this as the "column_air" approximation and found that it better represented the true O 3 column than other methods. On average, the extrapolated amount of CH 2 O and O 3 typically made up 13 and 10% of their respective total column densities. On occasion there were missed approaches where the P-3B descended to~30 m above ground level. Fried (private communication) compared CH 2 O column densities extrapolated from 300 m above the surface to actual results using data from the missed approaches and found that column densities typically only differed by ±3%.
An additional calculation was performed to calculate the column density of both O 3 and CH 2 O within the lowest portion of the column that is chemically perturbed by surface emissions. We refer to this layer as the chemical PBL. Since CH 2 O is a product of secondary oxidation of VOC emissions, its vertical gradient does not always follow that might be expected based on the meteorological definition of the PBL (e.g., based on potential temperature gradients). Instead, CH 2 O often continues to be produced in residual layers above the meteorological PBL, especially in the morning. Because the lifetime of CH 2 O is so short (~1-3 h), profiles exhibit a clear transition between surface influenced values (due to emissions and chemistry) and much lower mixing ratios in the free troposphere. This chemical gradient is the basis for establishing the chemical PBL heights used in this work. Chemical PBL heights were manually determined for each spiral by using vertical profiles of CH 2 O from each spiral. With these chemical PBL heights established, column densities were calculated by using the column_air approach and the chemical PBL height for each spiral as the maximum altitude. Using this approach, the typical extrapolated amount of CH 2 O and O 3 (that is, values extrapolated from the surface up to~0.3 km) made up 14 and 17% of their respective chemical PBL column densities. Throughout the rest of this work, these chemical PBL column densities will be referred to as X PBL , where X is either CH 2 O or O 3 .

The Langley Research Center Photochemical Box Model
The NASA Langley Research Center (LaRC) time-dependent photochemical box model was used to simulate chemical processes during DISCOVER-AQ. The model is constrained by inputs of trace gas precursors, then calculates the diurnal steady state profiles of radicals and other computed species (such as CH 2 O) for each set of measurements [Crawford et al., 1999;Olson et al., 2001Olson et al., , 2006. Isoprene chemistry has been updated and is based on the Mainz Isoprene Mechanism 2 scheme [Taraborrelli et al., 2008], with isoprene nitrate chemistry based on Paulot et al. [2009].In effect, the "model time" runs forward until all radical species are in diurnal equilibrium. The appropriateness of the equilibrium assumption can be a problem when short-lived species dominate the model photochemistry, e.g., highly reactive VOCs such as biogenic isoprene or alkenes  [Fried et al., 2011]. The intricacies of the relationship between model assumptions in proximity to such sources are expanded upon in section 5 of this work. Additional model uncertainties arise from uncertainties in measured constraints and uncertainties in kinetic and photolytic rates.
When available, model inputs of trace species were calculated by using 10 s moving averages of data from the P-3B 1-s data merges. Of the nonmethane hydrocarbons (NMHCs) used to constrain the model, ethane, propane, C 4+ alkanes, ethene, C 3+ alkenes, ethyne, and higher aromatics were not measured on board the P-3B. To estimate these missing inputs, data from the Studies of Emissions, Atmospheric Composition, Clouds, and Climate Coupling by Regional Surveys (SEAC 4 RS) and the Front Range Air Pollution and Photochemistry Experiment (FRAPPE) field campaigns were used. The SEAC 4 RS instrument payload flew aboard the NASA DC-8 aircraft, and was based out of Houston, Texas, from August to September 2013, and the FRAPPE instrument payload flew aboard the NCAR/NSF C-130 and was based out of Broomfield, Colorado, from July to August 2014. Both field experiments were coincident in space and time with DISCOVER-AQ deployments. During these two campaigns, University of California Irvine's whole air sampler (WAS) was used to measure a suite of more than 75 VOCs, including many of the missing NMHC inputs [Colman et al., 2001;Simpson et al., 2010;Schroeder et al., 2014]. To estimate the missing NMHCs in the DISCOVER-AQ data sets, relationships between NMHCs measured by the WAS instrument during SEAC 4 RS and FRAPPE were used. For example, C 3+ alkenes were not measured on the P-3B during DISCOVER-AQ but were measured by WAS during SEAC 4 RS and FRAPPE. From the WAS data, it was found that the lumped C 3+ alkene mixing ratio varied linearly with propene mixing ratios. Because propene was measured on the P-3B, this derived relationship was then used to estimate lumped C 3+ alkenes in the DISCOVER-AQ data sets. Similar relationships were used to estimate mixing ratios of ethane, propane, lumped C 4+ alkanes, ethene, ethyne, and higher aromatics as well. These relationships are provided in the supporting information accompanying this work. Additionally, total NMHC reactivity (calculated from WAS data) was found to vary linearly with the sum of the mixing ratios of benzene, toluene, propene, and isoprene. This relationship was used to estimate total NMHC reactivity in the P-3B data, and when combined with CH 4 measurements made on board the P-3B, was used to estimate total hydrocarbon reactivity. The WAS data were also used to estimate the fraction of VOCR that would be required to be estimated in each region. In Maryland, total VOCR was found to be dominated by contributions from CH 4 , CO, and biogenic VOCs such as isoprene, and the portion of total VOCR that was estimated by using relationships derived from the WAS data was typically less than 20%. In Houston, where anthropogenic emissions were much stronger, this portion was typically less than 35%.
In addition to calculating instantaneous concentrations of radicals and CH 2 O, the box model was used to calculate instantaneous rates of formation, destruction, and production for O 3 and CH 2 O. Here "production" is used to refer to the net chemical tendency-that is, difference between the instantaneous rates of formation (i.e., F(X)) and destruction (i.e., D(X) and, as in equation (1), where X represents either O 3 or CH 2 O:

Discover-AQ Observations: The Relationship Between O 3 and CH 2 O
A plot of P-3B column-integrated O 3 versus CH 2 O from all four DISCOVER-AQ deployments is shown in Figure 2 in terms of Dobson units (DU), where 1 DU is equal to a column density of 2.69 × 10 16 molecules/ cm 2 . A total least squares fit (i.e., orthogonal regression) was applied to these data, and the corresponding coefficient of determination (r 2 ) is given. While the statistical correlation is fairly modest, this plot shows that high CH 2 O column densities are generally associated with high O 3 column densities, and vice versa. The low r 2 is somewhat driven by the limited dynamic range of the data from the California and Colorado deployments. When only considering Maryland and Houston, the r 2 improves considerably to 0.68.
The anticipated sensitivity for CH 2 O detection from TEMPO is~0.35 DU. The anticipated sensitivity for CH 2 O detection from TEMPO is~0.35 DU for a single retrieval (which takes 3 h). Because most of the data from Colorado and California fall below this cutoff, data from these two locations are not considered in further analysis. It should be noted, however, that resolving information below this precision could be accomplished Journal of Geophysical Research: Atmospheres 10.1002/2016JD025419 by averaging measurements either spatially or temporally. Exploring this option to look for O 3 -CH 2 O relationships over coarser spatial and temporal scales would be interesting but would require a larger data set that is available from this study. Figure 3 shows the O3-CH2O relationships for Maryland and Houston when whole column CH2O is compared to PBL column O 3 , where both plots show a total least squares fit (i.e., orthogonal regression) rather than an ordinary least squares fit. This relationship (i.e., PBL O 3 versus whole-column CH 2 O) was chosen because it is most relevant for air quality monitoring from space-that is, understanding whether total column CH 2 O from satellites could be a reliable indicator of near surface O 3 gradients. Fortunately, differences in whole-column and chemical PBL column for CH 2 O are small (on average, this subtraction is 0.11 DU, with a maximum of 0.31 DU and a minimum of 0.01 DU) since its lifetime is short and its column abundance and variability is dominated by the chemically perturbed PBL. Considering data from Maryland and Houston in aggregate, the median value for the ratio of CH 2 O PBL /CH 2 O total was 0.92, while O 3,PBL /O 3,total was 0.71. Even when considering the remaining column above the maximum altitude sampled by the P-3B, the incremental amount of CH 2 O is expected to be small and have less variability than near the surface. In the SEAC4RS data set, which extends up to 12.5 km in altitude, the median value of CH 2 O above 3.2 km was found to be 0.25 ppbv (25th and 75th percentile values of 0.18 and 0.37 ppbv, respectively). Assuming a constant 0.25 ppbv of CH 2 O from 3 to 12 km, we estimate an additional 0.1 DU of CH 2 O that would not be accounted for by our spirals.
In the following sections, temporal and spatial trends in the DISCOVER-AQ data are explored (i.e., sections 4.2-4.4), and the LaRC box model is used to show that the relative contribution of biogenic emissions to CH 2 O formation explains the differences in behavior seen in Maryland and Houston (section 5).

Temporal Trends of O 3 and CH 2 O in Maryland and Houston
The aggregate statistics for Maryland and Houston show relationships between CH 2 O and O 3 that are promising enough to explore in more detail. In order to maximize the effectiveness of future satellite missions such as TEMPO, we must understand the circumstances under which the relationship is most likely to produce  Journal of Geophysical Research: Atmospheres 10.1002/2016JD025419 useful information. Examining how O 3 and CH 2 O vary spatially and temporally provides insight into which days, and photochemical environments are the best candidates for using column CH 2 O measurements to estimate near-surface O 3 tendencies, as well as why some days, and photochemical environments are poor candidates. Figure 4 shows two individual days (28 and 29 July) in Maryland. On both days, O 3 reached unhealthy levels in the region (the maximum measured O 3 on each day was~100 ppbv). On 28 July, there was a poor overall correlation (r 2 = 0.11) between O 3 and CH 2 O, while 29 July had a strong overall correlation (r 2 = 0.72). On 29 July, CH 2 O had a net positive tendency at each spiral site throughout the day. With the exception of Padonia, CH 2 O column densities increased from morning to midday to afternoon at each site on 29 July. This can be seen in the bottom right panel, where data points are color-coded by location and sized by time of day. Here for each site (except Padonia), CH 2 O increased as the day progressed, with the highest values of the day seen during the final spiral at each site. For Padonia, the measured CH 2 O column density first decreased between the morning and midday spiral, then increased between the midday and afternoon spiral. This general behavior of monotonically increasing CH 2 O was not seen on 28 July. Here a wide range of CH 2 O column densities was observed in the morning, and CH 2 O column densities did not monotonically increase throughout the day at any site. This relationship was seen on other days in Maryland as well-that is, days where CH 2 O column densities increased throughout the day at each site tended to have stronger overall correlations between O 3 and CH 2 O, while days where CH 2 O column densities varied little throughout the day at each site tended to have weak overall correlations. Table 2 shows the average CH 2 O column density measured in the morning, at midday, and in the afternoon for each research flight, averaged over all locations. Also shown is the average change in CH 2 O between morning, midday, and afternoon segments and the r 2 value for the CH 2 O-O 3 relationship on each flight day. In general, days where the average change in CH 2 O is greater than 0.05 DU (i.e., 1, 2, 10, 14, 22, 27, and 29 July) tended to have the strongest correlation between CH 2 O and O 3 column densities. In Texas, no days had a visually obvious diurnal increase in CH 2 O column density, and generally weak overall correlations between CH 2 O and O 3 were observed on each day.
From this, it is apparent that estimating near-surface O 3 tendencies from satellite-based CH 2 O measurements could be most accurately done on days where CH 2 O has a strong diurnal trend. Furthermore, Maryland seems to be a better candidate than Houston for this type of analysis. However, a weak diurnal trend in CH 2 O and a poor O 3 -CH 2 O correlation for an individual day (i.e., 28 July) does not necessarily exclude that day from being included in analysis. When consecutive days are looked at in series, individual days that lack a diurnal scale correlation may contribute important information on a longer-term trend. In Maryland, the P-3B flew on four consecutive days from 26 to 29 July. Over this period, average ambient temperatures increased on each consecutive day. Consequently, biogenic emissions and photochemical activity increased with each consecutive day, and CH 2 O followed suit [Geron et al., 2001;Palmer et al., 2006;Millet et al., 2008]. This can be seen in Figure 5, where, although 28 July had a poor correlation, useful information could still be resolved when viewed in the context of the surrounding days. These observations suggest that column measurements of CH 2 O may be useful for estimating near-surface O 3 tendencies (that is, periods of O 3 production) so long as there is sufficient variation in CH 2 O-and that the time scale for this variation may be variable. In the next section we explore the O 3 -CH 2 O relationship when data are filtered based on CH 2 O variation. In section 5 we use modeling and additional observations to explain how diurnal trends in biogenic emissions are the primary driver of diurnal trends in CH 2 O, while O 3 production is much less sensitive to changes in biogenic emissions.

Using ΔCH 2 O as a Data Filter
The results in section 4.2 demonstrate that short-term (i.e., diurnal or multiday) changes in CH 2 O can correspond to associated increases in O 3 ; however, O 3 exceedances can still occur via local photochemical production without associated changes in CH 2 O (for example, if VOCR stays relatively constant, CH 2 O abundances may stay relatively constant while O 3 accumulates), and advection of O 3 -rich air masses could create O 3 exceedances without associated increases in CH 2 O. This raises the question of how consistently one could use changes in CH 2 O to indicate periods of O 3 production. To explore this question, data from Maryland and Houston were filtered to identify pairs of consecutive CH 2 O observations exhibiting temporal change. The filtering process went as follows: If the CH 2 O column density increased by an average of at least 0.025 DU/h and 5%/h between two consecutive spirals, both spirals were included. If these criteria were not met, these spirals were not included. For example, in cases where these criteria were met between the second and third spirals but not the first and second spirals, only the second and third spirals of that day would be retained. In Maryland, 123 of the original 189 spirals (65%) met these criteria, while 99 of the original 196 spirals (52%) from Houston met these criteria. Plots of CH 2 O column densities and O 3 PBL column densities from these filtered data sets are shown in Figure 6.
In Maryland, the statistical correlation greatly improved when this filter was applied (r 2 increased from 0.57 to 0.75), while the correlation in Houston showed only marginal improvement (r 2 showed strong evidence of titration. O 3 mixing ratios at the surface were less than 20 ppbv until about 09:00, while NO x was sustained above 60 ppbv. At the time of the P-3B morning spiral at 10:30, O 3 (NO 2 ) mixing ratios at surface were~50 (~55) ppbv, were~55 (~50) ppbv at the bottom of the spiral (300 m), and werẽ 65 (~40) ppbv just 500 m above the surface. In situations such as this, where a large, persistent NO x source continually titrates O 3 near the surface (especially in the morning hours), changes in surface O 3 are decoupled from CH 2 O and the two are uncorrelated. Using a similar filtering approach to that described above, column CH 2 O measurements from future geostationary satellites could be used to identify and monitor the spatial and temporal tendencies of near-surface O 3 production in select locations-but may not be useful in estimating near-surface O 3 mixing ratios or identifying periods of O 3 exceedance. Figure 6 are colored by the column-average water vapor mixing ratio. This explains some of the deviations from linearity noted in the Maryland data. At high water vapor mixing ratios, the O 3 -CH 2 O relationship becomes shallower. This is likely because of the primary production of radicals via interactions between water vapor and O 3 (or more accurately, the O( 1 D) produced by O 3 photolysis). CH 2 O concentrations are often correlated with radical concentrations [Sillman et al., 1995;Parrish et al., 2012], but because O 3 production in Maryland is strongly NO x -limited, subsequent increases in radicals (and hence, CH 2 O) have a limited effect on O 3 production [Sillman et al., 1990;Kleinman, 1994;Duncan et al., 2010]. Additionally, the reaction between O( 1 D) and water vapor acts as a sink of O 3 , resulting in a lower net P(O 3 ) at high water vapor loadings, all else being equal.

Data in
It is important to note that this filter isolates short-term periods of O 3 production and should not preclude examining CH 2 O variability over periods of multiple days as a useful way to show changes in air quality associated with synoptic-scale influences, as in the case study of 26-29 July presented in section 4.2. At present, however, no judgment can be made on the further applicability of the results of that case study because the DISCOVER-AQ data lacks sufficient examples of multiday fights with increasing (or decreasing) CH 2 O. In Houston, the P-3B sampled four consecutive days on one occasion (11-14 September), but the daily average CH 2 O column density only varied by~0.2 DU over these 4 days and did not vary monotonically. There were two other events (20-22 July in Maryland and 24-26 September in Houston) where the P-3B sampled on three consecutive days, but the daily average CH 2 O column density did not vary monotonically over these three days-in both cases, it increased from the first to the second day of the period, then decreased from the second to the third day, indicating that those periods did not cover a lone synopticscale air quality event that spanned multiple days. In effect, when the timespan of an individual air quality event spans multiple days (as in 26-29 July), data collected throughout the duration of the event may be viewed collectively. However, when an air quality event only spans 1 day, aggregating data over periods of multiple days is not appropriate. In the next section, we show that persistently strong spatial gradients in Houston mask any temporal trends in the O 3 -CH 2 O relationship, and in section 5 we show that differences in the relative contributions of biogenic and anthropogenic VOC emissions to secondary CH 2 O formation explains the differences in behavior between Maryland and Houston.

Spatial Trends in the Column O 3 -CH 2 O Relationship
In the previous sections, it was established that O 3 PBL columns correlate better with column CH 2 O in Maryland than in Texas. Part of this may come from the reduced dynamic range of air quality conditions observed in Houston compared to Maryland. During the Maryland campaign, O 3 exceedances were recorded on 9 of the 14 flight days. Additionally, two of the remaining flight days were exceedingly clean due to a period of strong transport from Canada. By contrast, conditions were stubbornly consistent during the Houston deployment, with cloudy, often rainy, and moderately polluted conditions (but no O 3 exceedances) persisting until the final flight days of the deployment on 25-26 September. On 25 September in particular, Houston recorded a 1 h O 3 exceedance in excess of 120 ppbv at the Manvel Croix site, and exceedances were recorded widely from the Houston ship channel to the south, whereas conditions were clean to the north. This strong spatial contrast was created due to a bay breeze recirculation of pollution from the ship channel over areas to the south, and very strong emissions of highly reactive VOCs, such as propene, ethene, and isoprene, from the ship channel region [Loughner et al., 2014;Fried et al., 2016].
By looking at the data for 25 September spatially rather than temporally, some interesting relationships are revealed. Overall, there was a weak correlation between PBL-column O 3 and column CH 2 O (i.e., the dashed black line in Figure 7). However, when data from the midday and afternoon circuits were isolated (i.e., the orange and red lines in Figure 7), stronger correlations emerged, and the slope increased from 8.1 at midday to 11.6 in the afternoon. Due to the bay breeze recirculation mentioned above, conditions on this day were fairly stagnant with a light northwesterly breeze in the morning and a light southeasterly sea breeze in the afternoon. The spatial influence on CH 2 O is evident in that the highest column values were restricted to the ship channel area (i.e., Deer Park and Channelview, where CH 2 O column densities were greater than 0.8 DU), which was dominated by anthropogenic emissions. Even at Channelview, just to the north of the ship channel, high CH 2 O from ship channel emissions was not present until the second circuit. By the third circuit, emissions were seen to reach Moody Tower and Manvel Croix as column values approached 0.8 DU. Only one other site (Smith Point, located southeast of the ship channel) saw CH 2 O approaching 0.8 DU, but it was on the second circuit prior to the full buildup of O 3 . Trajectory calculations done by using the NOAA Hybrid Single-Particle Lagrangian Integrated Trajectory dispersion model (http://ready.arl.noaa.gov/HYSPLIT_disp.php) reveal that the ship channel plume reached Smith Point by midday, but then dissipated by the afternoon. Thus, the large increase/decrease pattern at Smith Point on this day was likely due to transport from the ship channel. In the case of Manvel Croix, it took several hours for the ship channel plume to reach the local area, and it arrived just before the midday spiral. For Moody Tower, the plume also took several hours to reach the local area, but it arrived between the midday and afternoon spirals. In both cases, there was a large increase in O 3 upon arrival of the plume, but a modest increase in CH 2 O (still larger than the filter cutoff described in the previous section, though). VOCR at the ship channel was dominated by emissions of short-lived VOCs such as propene, ethene, and isoprene [Murphy and Allen, 2005;Vizuete et al., 2008;Parrish et al., 2012;Zhu et al., 2014;Fried et al., 2016]. After several hours of transport, these short-lived VOCs would be fairly depleted, and consequently, the ambient CH 2 O mixing ratio would be reduced as well [Fried et al., 2011]. Conversely, O 3 would have accumulated throughout the whole transport process. Thus, depending on how long the transport time is between the ship channel and a given location, the ratio of ΔO 3 /ΔCH 2 O upon plume arrival will vary in turn. This effect has been previously observed in areas downwind of large VOC emission events in Houston. Parrish et al. [2012] and Wert [2003] both found that during downwind transects of the ship channel plume, the CH 2 O/ O 3 ratio peaked in the near field (about 10-20 km downwind or~1 h of transport time), and then decreased as the plume moved further downwind. This is likely why the filter approach shown in Figure 6 failed to improve the correlation in Houston: Spirals conducted in areas that were far downwind of the ship channel plume may still have experienced a large enough change in CH 2 O to be above the filter cutoff, disproportionately large increase in O 3 , thereby introducing another source of uncertainty in the form of transport time. In Maryland, where biogenic emissions (which are spread out throughout the region) dominate the regional VOCR, this issue does not arise.

Discussion
While the observations presented in section 4 alone may be useful, they do not yet explain the discrepancy between Maryland and Houston-that is, what might be the theoretical basis for why the O 3 -CH 2 O correlation is much stronger in Maryland than in Houston. In the previous section, it was shown that temporal variability in CH 2 O column densities was required for a strong correlation to emerge, and it was hypothesized that differences in biogenic versus anthropogenic contributions to the local hydrocarbon mix was the cause of the discrepancy between Maryland and Houston. In section 5.1, this idea is further developed and observational evidence is provided to show that a widespread, temporally variable biogenic source dominates CH 2 O production in Maryland, while near-constant anthropogenic sources dominate CH 2 O production in Houston. In this section, the LaRC photochemical box model is also used to show that primary CH 2 O emissions likely contributed a minimal amount to the CH 2 O abundances observed during DISCOVER-AQ. In section 5.2, the LaRC photochemical box model is employed to provide a theoretical understanding of why O 3 and CH 2 O best correlate when CH 2 O has strong temporal variability.

The Impacts of Biogenic and Anthropogenic Hydrocarbon Emissions on CH 2 O in Maryland and Houston
While secondary production of CH 2 O can be affected by abundances of NO x and other radical species (i.e., the branching ratio α in equation (R5)), ambient CH 2 O mixing ratios are primarily a function of the ambient VOCR [Parrish et al., 2012;Valin et al., 2014;Wolfe et al., 2015]. Hourly averages of column-integrated hydrocarbon reactivities (that is, column-integrated VOCRs, calculated by using the WAS correlations shown in the supporting information accompanying this work-it is of note that this approach excludes the contribution from OVOCs, which can make up a significant portion of total OH reactivity) for spiral sites in Maryland and Houston are shown in the top two panels of Figure 8. Column-integrated reactivities tended to increase throughout the day at all Maryland sites, while Houston sites tended to increase in the morning hours then remain relatively flat the rest of the day.
In Houston, the hydrocarbon mix is dominated by anthropogenic emissions with some influence from biogenic sources. Of particular importance are refineries near the ship channel that emit highly reactive NMHCs like ethene, propene, and anthropogenic isoprene [Ryerson, 2003;Murphy and Allen, 2005;Vizuete et al., 2008;Zhu et al., 2014;Fried et al., 2016]. By contrast, the hydrocarbon mix in Maryland is largely dominated by a strong biogenic source with an appreciable contribution from anthropogenic emissions. This difference in hydrocarbon emissions can explain the difference in the diurnal behavior of column VOCR between the two locations. Figure 8  Houston exhibited lower isoprene levels with no consistent diurnal behavior. One could speculate that this is due to the fairly cloudy conditions during many of the Houston flights-which could have suppressed biogenic emissions-or the fact that the Houston deployment took place in September, when isoprene emissions typically begin to decrease [Geron et al., 2001;Palmer et al., 2006;Potosnak et al., 2014]. This is inconsistent, however, with isoprene behavior at the Conroe site, located north of Houston in a more wooded area. It is more likely that a strong source of anthropogenic isoprene was observed over the refineries near the ship channel in Houston. Because these refineries are active throughout the day, isoprene in the region had little diurnal variability.
Globally, isoprene is the largest VOC contributor to secondary CH 2 O formation. To estimate the role of isoprene in driving changes in ambient CH 2 O mixing ratios for Maryland and Houston, the fractional contribution of isoprene and its oxidation products to the instantaneous rate of CH 2 O formation was calculated by using the LaRC photochemical box model. This quantity was calculated as the rate of formation of CH 2 O from isoprene and its oxidation products [Palmer et al., 2003;Millet et al., 2006;Marais et al., 2012]. The value reported here is the PBL-column-average of FCH 2 O ISOP /FCH 2 O total . Looking at all data collected in Maryland, the median FCH 2 O ISOP /FCH 2 O total between 250 and 500 m (i.e., the bottom of each spiral) was 0.75 but decreased to 0.2 at an altitude of 1 km. This is because the vertical distributions of isoprene and its oxidations products are heavily skewed toward the surface-the median isoprene mixing ratio between 250 and 500 m in Maryland was 0.5 ppbv but fell to 0.1 ppbv at 1 km. Therefore, the PBL-column-average values of FCH 2 O ISOP /FCH 2 O total that we report here are lower than the 0.5-0.8 reported by studies that focused on near-surface chemistry in Eastern North America [Lee et al., 1998;Macdonald et al., 2001;Sumner et al., 2001] but are consistent with the 0.2-0.3 reported by Pfister et al. [2008], who looked at the average contribution of isoprene to CH 2 O formation in the vertical column. Diurnal trends in column averages of CH 2 O and FCH 2 O ISOP /FCH 2 O total are shown for all sites in Maryland and Houston in the bottom two rows of panels in Figure 8. In Houston, FCH 2 O ISOP /FCH 2 O total had less variation than in Maryland, and ambient CH 2 O mixing ratios tended to remain relatively flat throughout the day. This, coupled with the weak diurnal trend in isoprene abundances in Houston and the overall lower PBL-column-average FCH 2 O ISOP /FCH 2 O total (~15% in Houston versus~28% in Maryland), suggests that biogenic emissions play a proportionately smaller role in driving changes in ambient CH 2 O levels in Houston. This is in agreement with previous studies that have found anthropogenic emissions to be the leading contributor to secondary CH 2 O in Houston [Wert, 2003;Parrish et al., 2012;Zhu et al., 2014]. It is also worth noting that there is an anthropogenic source of isoprene from the refineries in the Houston ship channel, so the true biogenic contribution to CH 2 O formation is likely smaller than reported here. In Maryland, a clear relationship is observed: column average CH 2 O mixing ratios increase (by about 250 pptv/h, on average) as column-average isoprene mixing ratios increase (by about 25 pptv/h), and the fractional contribution of isoprene increases (by about 0.02 per hour). This, coupled with the positive diurnal tendency in isoprene in Maryland, suggests that changes in biogenic emissions throughout the day play a large role in driving changes in ambient CH 2 O levels. However, it should be noted that even if all five carbon atoms from every isoprene molecule went on to form CH 2 O, the expected increase in CH 2 O due to changes in isoprene alone would be about 125 pptv/h (or about 1-1.5 ppbv over a day)-about half of the observed increase of 250 pptv/h (or about 2-3 ppbv over a day). This suggests that while biogenics play an important role in driving diurnal trends in CH 2 O in Maryland, there is another significant factor that must account for an increase of at least 1 ppbv of CH 2 O throughout the day. The identity of this "missing" factor can be explained by the diurnal tendency of CH 2 O in the absence of diurnally varying biogenic emissions, which is discussed in greater detail in section 5.2.
While ambient CH 2 O mixing ratios are dominated by secondary production, an understanding of the relative contribution of primary CH 2 O emissions is needed in order to fully understand these observations. Because the LaRC photochemical box model only includes secondary production of CH 2 O, the degree of agreement between measured and modeled CH 2 O concentrations can be used as an indicator of the importance of primary emissions. Figure 9 shows a comparison of modeled and measured CH 2 O mixing ratios in Houston on the afternoon of 25 September. On this day, the sky was clear and conditions were hot and fairly stagnant. Under these conditions, O 3 production was high and O 3 mixing ratios in excess of 100 ppbv were observed throughout the region, with a maximum of 145 ppbv measured south of downtown.

10.1002/2016JD025419
In Figure 9, model-predicted CH 2 O mixing ratios are generally within 20% of measured CH 2 O mixing ratios during each spiral on the third circuit. Two notable exceptions are Galveston and Smith Point, where the model underpredicted CH 2 O mixing ratios. The model is also low-biased in Channelview, but still within 20% of measurements. Because the lifetime of a given molecule will determine the length of time it will take to reach equilibrium, the time-distance from an emission source determines which species will be accurately predicted by the model. For example, when near a short-duration emission source (i.e., a "puff"), the model will most accurately predict species with very short lifetimes and will tend to overpredict species with longer lifetimes. When near a persistent emission source, such as a power plant or petrochemical facility, the model will generally be in good agreement with measurements except when strong advection occurs. This scenario is similar to 25 September in Houston, which was characterized by strong, persistent emissions from the ship channel, low wind speeds, and weak advection. Here the model performed best in urban areas and when near emission sources, such as Moody Tower, West Houston, Conroe, Channelview, and Deer Park. When far downwind of an emission source, the model may underpredict species with longer lifetimes due to the lack of ability to account for the time-history of an air mass. That is, the model cannot accurately account for previous accumulation of species with longer lifetimes. Although CH 2 O is considered a short-lived molecule, its lifetime is sufficiently long that it is susceptible to the deviations from equilibrium described above. This is why the model is low-biased in downwind sites such as Galveston and Smith Point, and to some extent Manvel Croix, which have chemical compositions that are dominated by transport rather than local emissions. Previous work has found that model predictions of CH 2 O in well-aged air masses may be underpredicted by a factor of 1.6 [Fried et al., 2011]. On other hot, clear-sky days in Houston and Maryland, similar agreement between model and measurement was observed (not shown, though generally within 30%). This indicates that on days where photochemical production of O 3 is high, primary sources of CH 2 O likely contribute a minimal amount to observed ambient CH 2 O mixing ratios. As noted above, 0-D box models may overpredict or underpredict CH 2 O mixing ratios under certain situations. Thus, while our results suggest a minimal importance of primary CH 2 O emissions on hot, clear-sky days, they do not completely exclude the possibility that primary CH 2 O emissions are important on these days. However, many literature studies are in agreement: In very large urban areas such as New York City [Lin et al., 2012b] and Mexico City [Garcia et al., 2006], photochemical production accounts for~70% of observed ambient CH 2 O mixing ratios at midday. In smaller urban areas, such as Columbus, Ohio, secondary CH 2 O production account for 80% of observed ambient CH 2 O [Mukund et al., 1996], and it is estimated that, on average, secondary production of CH 2 O accounts for more than 95% of observed CH 2 O in the eastern half of the U.S. [Li et al., 1994]. Further complicating this issue, separation of true primary emissions from pseudo-primary emissions (that is, photochemically produced CH 2 O that is created in the immediate vicinity of an emission source) is very difficult. Many studies use correlations between CH 2 O and CO (a tracer of combustion) to determine the portion of CH 2 O attributed to primary emissions. However, highly reactive VOCs are also co-emitted during the combustion process, and in the presence of these highly reactive VOCs, large amounts of CH 2 O can be formed very quickly [Fried et al., 2016]. Measurements collected even a short distance downwind of a suspected primary source of CH 2 O may be likely to overpredict the amount of primary CH 2 O due to unaccounted for pseudo-primary CH 2 O produced between the source and the measurement site. Airborne studies conducted in Houston have explored this idea and also concluded that it is very difficult to separate primary CH 2 O emissions from pseudo-primary emissions. These studies also concluded that true primary emissions of CH 2 O likely contribute a negligible amount to ambient CH 2 O mixing ratios [Wert, 2003;Parrish et al., 2012;Fried et al., 2016]. Thus, the remainder of this paper will operate under the assumption that primary emissions are not a major contributor to the observed variability in CH 2 O column densities.

A Theoretical Basis for Understanding the O 3 -CH 2 O Relationship
Though O 3 and CH 2 O are produced by similar photochemical reactions, it is unlikely that co-production is the reason for this correlation. As described in sections 1 and 2, O 3 is expected to accumulate throughout the day while ambient CH 2 O mixing ratios are generally expected to reflect the local hydrocarbon oxidation environment. To explore this hypothesis, the LaRC photochemical box model was employed to investigate diurnal trends for O 3 and CH 2 O. Two separate simulations were run-one using average trace gas inputs from below 1.5 km in Maryland and the other using average trace gas inputs from below 1.5 km in Houston. In both cases, the model was initialized at noon local time, and these trace gas inputs were held constant until the end of the following day, while O 3 , CH 2 O, and all radical species were allowed to vary. The results of these simulations (shown in Figure 10  Houston), while CH 2 O had very different tendencies between the two locations. In Maryland, CH 2 O reached a daily minimum of 3.9 ppbv near sunrise, then increased monotonically throughout the day (with a higher rate of increase in the morning than the afternoon) until reaching a maximum value of 4.9 ppbv at sunset. This 1 ppbv increase in CH 2 O throughout the day due to chemical production under constant VOC conditions may explain the missing CH 2 O noted in section 5.1. As noted in section 5.1, isoprene and the total VOCR tended to increase throughout the day in Maryland, so the 1 ppbv of CH 2 O produced in this simulation where VOCs were held constant likely represents a low estimate of the net accumulation of CH 2 O from photochemical processing. In Houston, CH 2 O reached a daily minimum of 4.4 ppbv at sunrise, had a high rate of increase in the early morning, reached a daily maximum of 5.5 ppbv in the late morning, then decreased to a local minimum of 5.1 ppbv at around 16:00 local time. These differences in behavior can be attributed to differences in the average chemical composition between Maryland and Houston. Below 1.5 km, the aggregate mixing ratios of most VOCs were higher in Houston than in Maryland (with the notable exception of isoprene, which was 45% higher in Maryland), and the aggregate NO x mixing ratio in Houston was 136% higher than in Maryland (2.81 ppbv in Houston versus 1.19 ppbv in Maryland). Because of the large role that NO x plays in cycling between HO 2 and OH (i.e., reaction (R6), simulated midday mixing ratios of OH in Houston were a factor of 2.5 higher than in Maryland. As a result, the midday lifetime of CH 2 O due to reaction with OH in Houston was a factor of 2.5 lower than in Maryland (2.1 h in Houston versus 5.3 h in Maryland). For reference, the midday lifetime of CH 2 O due to loss by photolysis was about 3.7 h in both locations. Therefore, the overall lifetime of CH 2 O (that is, including loss by photolysis and reaction with OH) at midday in Houston was lower than in Maryland by a factor of 1.7. As a result of this short lifetime, CH 2 O had very little tendency to accumulate in Houston and responded much more quickly to changes in the rate of hydrocarbon oxidation than in Maryland. Because hydrocarbons were effectively held constant in these simulations, variability in the rate of hydrocarbon oxidation could only arise from changes in OH-which has a diurnal profile that is nearly identical to that of J NO2 (i.e., the green trace in Figure 10). When coupled with the fact that reaction with OH was the dominant removal pathway for CH 2 O in Houston, an understanding of the unique diurnal trend in CH 2 O observed in Houston emerges. In the early morning, CH 2 O increased in response to initial increases in OH-but by the late morning, the CH 2 O lifetime became short enough such that accumulation was retarded, and the CH 2 O mixing ratio began decreasing by midday as a result of increased destruction.
To explore how changes in air mass composition affect equilibrium CH 2 O mixing ratios, a "base case" diurnal profile of CH 2 O and O 3 in relatively clean conditions was created where trace gas inputs were held constant throughout the day. Then, three additional simulations were run such that at 12:00, the air mass composition was abruptly changed. The results of these simulations are shown in Figure S8 in the supporting information accompanying this work, where the base case is shown as a black trace, and the three simulations where composition was changed are represented as traces A, B, and C. The NO x , alkane, and alkene input mixing ratios for the base case and simulations A, B, and C are given in Table S1 in the supporting information. In all cases, when the air mass composition was changed, CH 2 O mixing ratios rapidly re-adjusted to new equilibrium levels within about 3 h. The amount of CH 2 O present once equilibrium was re-established, however, depended on the new air mass composition. Note that the diurnal profile for CH 2 O in this simulation appears slightly different from those in Figure 9-this is because the lower amount of CH 2 O present in the base case simulation yields lower rates of CH 2 O destruction, which is first order with respect to CH 2 O.
These results imply that changes in ambient O 3 concentrations can be decoupled from changes in ambient CH 2 O concentrations. In a regime where primary CH 2 O emissions are negligible and emissions of chemical precursors remain relatively constant throughout the day, CH 2 O is expected to reach equilibrium in the morning, and have small fluctuations throughout the rest of the day-in the constant-VOC simulations presented in Figure 10, the daily maximum CH 2 O mixing ratio was only about~20% higher than the daily minimum. At the same time, if O 3 production is positive, we expect O 3 to increase monotonically throughout the day by virtue of its long lifetime. On the other hand, in an O 3 -producing regime where precursor emissions increase throughout the day, we expect to see O 3 and CH 2 O co-vary with a much larger dynamic range in CH 2 O than was observed in Figure 10. To test this hypothesis, another simulation was run by using the base case constraints described above. In this simulation, however, isoprene was set to increase at a rate of 600 pptv/h throughout the daytime hours. The results of this simulation are shown in Figure 10. Here CH 2 O increased Journal of Geophysical Research: Atmospheres 10.1002/2016JD025419 fairly linearly at a rate of~1 ppbv/h in response to increasing VOCR from isoprene. Calculated values of P(O 3 ) increased until reaching a maximum value of~6 ppbv/h before midday, then began to decrease again at 16:00. In this environment, where VOCR increased throughout the day, P(O 3 ) was limited by NO x availability and thus was not affected by changes in isoprene to the same degree that CH 2 O was.
The results of these simulations, along with the observations shown in section 5.1, provide support to the hypotheses suggested in section 4: O 3 and CH 2 O column densities have a strong correlation only when temporal variability in CH 2 O is introduced, and better correlations were observed in the biogenically dominated environment of Maryland because of the temporal variability that biogenic emissions imprinted on ambient CH 2 O values. However, these simulations do not provide a mathematical framework for understanding the results presented in section 4. For example, these simulations do not explain why a slope of~13 DU O 3 /DU CH 2 O is observed in Maryland, nor do they explain the observed increase in slope between midday and afternoon when data from Houston were viewed spatially. What follows is a mathematical description of these results, derived from first principles.  ((2), where the proportionality factor β is a function of local NO x concentrations, actinic flux, the respective CH 2 O and O 3 branching ratios per hydrocarbon oxidation (α CH2O and α O3 ), and the local lifetime of CH 2 O (τ CH2O ). A plot showing the relationship between β, NO, and CH 2 O lifetime for data collected in Houston and Maryland is shown in Figure S12 accompanying this work, and β is taken to be the slope of the total least squares fit to these data (β = 1.7 ± 0.01 h À1 ), shown in Figure S13.
Since the goal of this section is to create a very broad mathematical understanding of the O 3 -CH 2 O relationships reported here, β will be simplified and essentially treated as a constant with a value of 1.7 ± 0.01 h À1 . Integrating equation ((2), equation ((3) is derived: In Maryland, a linear fit is appropriate for describing the diurnal trends in CH 2 O (see Figure 8). Assuming that CH 2 O varies linearly throughout the day with a slope of κ and a y intercept of CH 2 O initial (i.e., equation (4), results in equation ((5).
Here (CH 2 O initial )t represents the O 3 production associated with the initial lump of CH 2 O (i.e., the rate of hydrocarbon oxidation associated with the initial atmospheric composition) and κt 2 /2 represents additional O 3 production (or loss) associated with increases (or decreases) in CH 2 O (i.e., O 3 production or loss associated with deviations from the initial atmospheric composition). Calculating the relative changes in O 3 and CH 2 O (i.e., equation ((5) divided by equation (4) under these conditions, results in equation ((6):  Figure 6.
In Houston, on the other hand, CH 2 O tended to remain constant throughout the day because of the dominance of anthropogenic emissions. This results in From this equation, it can be seen that for a given range of CH 2 O values (each of which remains constant over time), the slope between O 3 and CH 2 O will increase over time. This was simulated in Figure 11 using a range of CH 2 O column densities from 0.2 to 1.2 DU, an initial O 3 column density of 9 DU, a β value of 1.7 h À1 , and times binned every 2 h over a 10 h period. Figure 12 simulates the spatial behavior seen in Houston on 25 September (i.e., Figure 7).
Deviations from the idealized behavior described in equations ((6) (for systems in which CH 2 O varies linearly throughout the day) and (7) (for systems in which CH 2 O remains relatively constant) can come in many forms. In both cases, the proportionality factor β can vary from the average value used here. In the morning and evening, when the CH 2 O lifetime is~3-4 h and NO x emissions are relatively high, β is reduced (model estimates predict~0.5 h À1 ) and a shallower slope would be observed. During midday hours (~10:00-14:00) when the CH 2 O lifetime is~1-2 h and NO x emissions are relatively low, β would be higher (model estimates predict~2 h À1 ). Furthermore, transport is not accounted for in these equations and could be a source of uncertainty (for example, the spatial behavior noted in Houston in section 4). However, these equations are not meant to be used as a robust model, but merely provide a mathematical framework for understanding the results presented here.

Long-Term Averages of Surface O 3 and Column CH 2 O
In the previous sections, it was shown to be possible to estimate near-surface O 3 tendencies and gain an understanding of the spatial and temporal trends in near-surface O 3 production from column measurements of CH 2 O, and the accuracy of this estimation is much greater when regional emissions are dominated by biogenic VOCs (i.e., Maryland) as opposed to anthropogenic VOCs (i.e., Houston). However, from a human health perspective, estimating O 3 exposures at the surface would be much more germane. Previous work has investigated relationships between surface and column measurements of O 3 from DISCOVER-AQ data and found that the relationship can be complex and highly variable in space and time Reed et al., 2013;Flynn et al., 2014;Thompson et al., 2014]. Furthermore, local meteorological conditions can create stratifications within the lower troposphere, making it very challenging to derive actual surface mixing ratios even from PBL column densities. For example, He et al.
[2014] noted a persistent layer with 120 ppbv of O 3 at 800 m from 18 to 23 July during the Maryland deployment Figure 11. Box model simulation of O 3 (black) and CH 2 O (red) using the inputs described in the base case scenario above. During daytime hours, isoprene was set to increase at a rate of 600 pptv/h. The green trace is unitless and is a qualitative representation of actinic flux. Figure 12. Simulation of equation (7)  surface/column ratio include transport (both on local and regional scales), meteorology (both on local and regional scales), and temporal and spatial variabilities in emissions. While there is significant variability in the O 3 surface/column ratio on short time scales, this variability may be muted when averaged over longer time scales. In this section, spatial trends in the O 3 -CH 2 O relationship are investigated when viewed over a time scale of each DISCOVER-AQ deployment (about 1 month). Figure 13 shows the relationship between long-term (over the course of each deployment) averages of surface O 3 and whole-column CH 2 O in Maryland, Houston, and Colorado, colored by site location. In Maryland and Houston, correlations are sufficient such that long-term O 3 exposures could be reasonably estimated from long-term averaging of column measurements of CH 2 O at discrete locations. While the variability bars for both CH 2 O column densities and surface O 3 mixing ratios in Figure 13 are quite large (representing a wide range of day-to-day conditions), the mean values for each location have reasonable correlations in Maryland and Houston. In Colorado, however, the dynamic range in CH 2 O is substantially smaller than in either Maryland or Houston, and the resulting correlation is weak. In Houston, five sites (Conroe, West Houston, Moody Tower, Channelview, and Deer Park) have similar long-term average values of CH 2 O column density and surface O 3 mixing ratios. As a result, the slope and r 2 values for the Houston data likely do not represent the true values that would be seen if locations were chosen to maximize the dynamic range in surface O 3 and column CH 2 O without producing clusters of similar numbers. While this highlights one of the shortcomings of the DISCOVER-AQ data set (we are limited to data collected at the spiral sites-eight places in Houston, five in Maryland, etc.), it also provides an interesting glimpse into how future air quality monitoring networks could be optimized. Using a mobile network of co-located surface measurements of O 3 and surface-based column measurements of CH 2 O (from a Pandora spectrometer, for example), a more representative relationship between long-term averages of column CH 2 O and surface O 3 could be created by optimizing measurement sites to maximize the dynamic ranges of the measurements. Then, this relationship could be applied to long-term averages of satellite measurements of CH 2 O at discrete locations to map out long-term exposures to surface O 3 in between surface monitoring sites-effectively providing a map of estimated long-term O 3 exposure throughout an entire region.

Conclusion
Using data collected during NASA's DISCOVER-AQ field campaigns, a correlation between column measurements of CH 2 O and O 3 was observed. When the O 3 column was restricted to only include O 3 contained within the chemical PBL, the correlation improved for data collected in Maryland, but had no effect on data collected in Houston. Further analysis of these data revealed that O 3 and CH 2 O were more strongly correlated when temporal changes in CH 2 O were present. By creating a filter that removed data that had little temporal variation in CH 2 O, a strong correlation emerged (r 2 = 0.75) between O 3 and CH 2 O in Maryland. In Houston, however, there was no improvement to the weak correlation between O 3 and CH 2 O. A case study of 25 September in Houston-the day that had the highest surface O 3 mixing ratios observed in Houstonrevealed the reason for the poor correlation: By viewing the data spatially rather than temporally, relatively strong correlations emerged and suggested that a time-of-transport term for plumes originating from the ship channel was the cause of the weak overall correlation between O 3 and CH 2 O in Houston.
Further analysis provided an explanation for the different behaviors that were observed between Maryland and Houston. In Maryland, biogenic emissions fueled diurnal increases in VOCR, which in turn caused CH 2 O to, on average, increase monotonically throughout the day. Because O 3 has a much longer atmospheric lifetime and accumulates throughout the day, CH 2 O and O 3 co-varied in Maryland. In Houston, however, anthropogenic emissions from the ship channel dominated the local VOCR. As a result, biogenics played a proportionately small role in local CH 2 O production, and no discernable diurnal trend in VOCR or CH 2 O was observed. Using the LaRC photochemical box model, we showed that ambient CH 2 O mixing ratios change in response to changes in the local hydrocarbon oxidation rate-primarily due to changes in OH or VOC reactivity. Model results suggested that because O 3 has a much longer lifetime and accumulates throughout the day, a strong correlation between O 3 and CH 2 O will emerge when CH 2 O mixing ratios increase over time in conjunction with increasing emissions of hydrocarbon precursors. Finally, it was found that long-term averages of CH 2 O (i.e., weekly or monthly averages) could be useful for estimating long-term O 3 exposure at the surface.
While the results of this work provide insights into the potential utility of future geostationary satellites to fill in gaps in surface monitoring networks and understand the true spatial extent of air pollution events, these results are unlikely to be applicable globally. As discussed above, the O 3 -CH 2 O relationship is most useful in regions where temporal variability in VOC emissions drives changes in ambient CH 2 O mixing ratios throughout the day (i.e., Maryland). In regions where CH 2 O levels are low (as was the case in the California and Colorado deployments of DISCOVER-AQ), satellites may not be able to sufficiently resolve gradients and this relationship cannot be applied at all. Furthermore, NO x has a complex, nonlinear effect on the chemical processes that produce/destroy O 3 and CH 2 O, and therefore can affect the slope of the relationship between column O 3 and CH 2 O. Because emission rates of biogenic hydrocarbons and NO x can vary greatly both regionally and seasonally, studies invoking the use of 3D models would be invaluable in determining regions and seasons where the O 3 -CH 2 O relationship would be most applicable. Guided by such model results, locations could be identified for siting co-located surface measurements and surface-based column measurements (such as Pandora spectrometers or tethered balloons) of O 3 , CH 2 O, and NO 2 . Such measurements would be invaluable for further evaluating these results and determining the locations and conditions for which the covariance of surface ozone and column CH 2 O would be a useful metric from satellite observations.