Environmental burdens have repeatedly been shown to concentrate in low-income and/or communities of color, and disagreement persists on how these inequalities should be understood and addressed. Navigating these disagreements has been a long-standing struggle for environmental justice (EJ) communities and researchers as they interact with polluters and the state. My dissertation is about environmental justice, three of the “main players” (the state, communities, and researchers), and how two emerging research methods (“big data” and community-engaged research) are shaping how these players interact and leverage science to inform and advocate for policy change. My motivation is to develop new research tools and methods to characterize uneven burdens of environmental health hazards more responsibly. I bridge participatory research methods and machine learning to explore the benefits of and challenges to bringing together community knowledge and big data in environmental justice research. I structure my dissertation around two research questions: (1) what might researchers’ “due diligence” look like when applying machine learning models to environmental justice questions; and (2) how do modern tools in the environmental justice toolbox - namely, big data analyses, predictive modeling, and interactive mapping – assist or obstruct the success of hyper-local environmental justice priorities and agendas?
In recent years, the popularity of “big data” and machine learning algorithms has spread from disciplines such as computer science and medicine to research on environmental monitoring and justice. These methods – and in particular, their predictive power – offer resource-efficient approaches to environmental justice-motivated questions, especially given the resource constraints of grassroots activists and their government counterparts. They come with caveats, however, as researchers in other fields have demonstrated how algorithms can further entrench structural inequalities. Learning about such findings pushed me to think critically about the implications of applying similar black box predictive models to environmental justice contexts. To address question (1) above, I present a case of machine learning for environmental justice: predicting water quality violations in community water systems across California. I then reflectively de-construct this model and its findings to evaluate potential implications if this algorithmic approach were to be implemented. The result is a well-performing (highly accurate) model for water quality violations in California, paired with a set of recommendations for future environmental justice research utilizing machine learning. In short, my recommendations are to transparently report and discuss the (sociodemographic) characteristics of communities that the model accurately and inaccurately predicts for, and to assess the effects of different input and output variable choices for potentially biasing the model. I conclude with a statistical-based approach that researchers might take to reduce the bias in their outcomes.
To address question (2), I bring big-picture algorithmic approaches in conversation with community engaged research through a practice brief on participatory science. The case for community engagement in research is based on the premise that communities possess knowledge of local environmental health risks that have historically been overlooked by regulatory agencies and academics. Over the past three years, I have been involved in two projects with EJ partners (Toxic Tides and the Drinking Water Tool) that blend community participation with modern tools in the EJ toolbox - large scale datasets, predictive models, and mapping. I am both proud of and grateful for the experience of working together with environmental justice groups on these projects. The practice brief – chapter six of this dissertation – summarizes the ways in which our state-EJ organization-academic collaboration helped strengthen our science and ultimately advance our environmental justice goals.
My dissertation follows the ``three paper" format, and is comprised of three manuscripts - one published, and two in peer-review (at the time of filing). Chapter one is the introduction chapter, in which I go over the narrative arc of my dissertation. Chapter two is a critical review of primary data-studies on environmental justice and drinking water. This was my first first-author publication, and has played a big role in how I think about and understand secondary data. In chapter three of my dissertation, I set the stage for my research through a perspective piece on the emerging and growing use of machine learning methods in environmental justice research. (I have also submitted this piece to a journal, but it is only being looked at by the editors, and will not go through a formal peer-review process.) In chapter four, I present the findings of a quantitative analysis using machine learning to predict drinking water quality in California. In chapter five, I offer a reflexive take on my machine learning approach, outlining how different upstream decisions resulted in demographic shifts on who the model was right or wrong about. Finally, in chapter six, I conclude with a practice brief reflecting on the work done by the Water Equity Science Shop – a research collaborative comprised of an environmental justice community-based organization (the Community Water Center), research groups at UC Berkeley and UCLA, and the Cal EPA’s Office of Environmental Health Hazard Assessment – as a practical contribution to theoretical discourse on community-engaged research and environmental justice.
In summary, I explore the opportunities and tensions of two popular approaches to environmental justice research: large scale predictive analysis and community-engagement. Both are powerful and compelling tools for furthering environmental justice research and empowering environmental justice communities, but they come with their caveats, and due diligence is required on the part of the researcher. My hope is that my research helps make these caveats both more legible and manageable to future researchers and their environmental justice collaborators. The longer-term motivation of my work is to inform efforts to connect state resources to on-the-ground community needs.