Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution.
Published Web Locationhttps://doi.org/10.1021/acs.est.7b01864
Spatiotemporal models to estimate ambient exposures at high spatiotemporal resolutions are crucial in large-scale air pollution epidemiological studies that follow participants over extended periods. Previous models typically rely on central-site monitoring data and/or covered short periods, limiting their applications to long-term cohort studies. Here we developed a spatiotemporal model that can reliably predict nitrogen oxide concentrations with a high spatiotemporal resolution over a long time span (>20 years). Leveraging the spatially extensive highly clustered exposure data from short-term measurement campaigns across 1-2 years and long-term central site monitoring in 1992-2013, we developed an integrated mixed-effect model with uncertainty estimates. Our statistical model incorporated nonlinear and spatial effects to reduce bias. Identified important predictors included temporal basis predictors, traffic indicators, population density, and subcounty-level mean pollutant concentrations. Substantial spatial autocorrelation (11-13%) was observed between neighboring communities. Ensemble learning and constrained optimization were used to enhance reliability of estimation over a large metropolitan area and a long period. The ensemble predictions of biweekly concentrations resulted in an R2 of 0.85 (RMSE: 4.7 ppb) for NO2 and 0.86 (RMSE: 13.4 ppb) for NOx. Ensemble learning and constrained optimization generated stable time series, which notably improved the results compared with those from initial mixed-effects models.