Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Previously Published Works bannerUC Berkeley

Identifying schools at high-risk for elevated lead in drinking water using only publicly available data

Abstract

Estimating the risk of lead contamination of schools' drinking water at the State level is a complex, important, and unexplored challenge. Variable water quality among water systems and changes in water chemistry during distribution affect lead dissolution rates from pipes and fittings. In addition, the locations of lead-bearing plumbing materials are uncertain. We tested the capability of six machine learning models to predict the likelihood of lead contamination of drinking water at the schools' taps using only publicly available datasets. The predictive features used in the models correspond to those with a proven correlation to the dominant, but commonly unavailable, factors that govern lead leaching: the presence of lead-bearing plumbing materials and water quality conducive to lead corrosion. By combining water chemistry data from public reports, socioeconomic information from the US census, and spatial features using Geographic Information Systems, we trained and tested models to estimate the likelihood of lead contaminated tap water in over 8,000 schools across California and Massachusetts. Our best-performing model was a Random Forest, with a 10-fold cross validation score of 0.88 for Massachusetts and 0.78 for California using the average Area Under the Receiver Operating Characteristic Curve (ROC AUC) metric. The model was then used to assign a lead leaching risk category to half of the schools across California (the other half was used for training). There was good agreement between the modeled risk categories and the actual lead leaching outcomes for every school; however, the model overestimated the lead leaching risk in up to 17% of the schools. This model is the first of its kind to offer a tool to predict the risk of lead leaching in schools at the State level. Further use of this model can help deploy limited resources more effectively to prevent childhood lead exposure from school drinking water.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View