Artificial intelligence (AI) has been sparked by significant advancements in Graphic Processing Units (GPUs). Machine learning (ML) and deep learning (DL) models have been widely employed for Earth system modeling due to their ability to fit any contiguous functions. Especially for hydro-climate systems, DL models have been adopted to simulate the processes based on our current understandings. Although the accuracy and performance are comparable or even better than process-based models, DL models are often referred to as 'black box' models since people can not intuitively understand why and how the model produces the desired results, necessitating the need for explainable AI and trustworthy AI. Additionally, when generalizing the models to datasets out of the training range, the trust of DL models is lacking since unlike process-based models, DL models do not explicitly satisfy physical constraints, and as a result, they are more likely to generate nonphysical results. In this dissertation, ML and DL models are adopted to simulate and analyze the components of Earth's hydro-climate system, including snowpack, streamflow and precipitation. These studies focus on both accuracy and interpretations from the DL models. As for precipitation, it is analyzed with a feature-based analysis together with ML models to understand the precipitation characteristics and meteorological drivers.
Several DL models are benchmarked with 20 basins in California for streamflow simulations. The model sensitivities with respect to input variables and input time window size reflect the unique streamflow dynamics over the Sierra Nevada basins. Although there are no explicit physical constraints in the DL model, an idealized test proves the mass conservation, providing confidence in future projection analyses. The future projections validate the distinct dynamic features over the Sierra Nevada basins once again.
In the snowpack simulation, three DL models are developed and tested with observational stations across the Western US. The DL models can achieve comparable accuracy compared with a process-based dataset. A permutation-based explainable AI method is applied to understand the importance of each input variable, which highlights the critical roles of precipitation and temperature in snowpack modeling. When DL models are extrapolated to generate gridded snow estimates, the extrapolation problem arises. It is alleviated with a simple transformation to the output variable, and this method is proved to be applicable to all of the DL models that have been examined. Finally this generalized DL model is used to generate climate projections and investigate the response of snowpack with respect to climate change.
The precipitation analysis focuses on the mean precipitation and extreme precipitation events in the North American Monsoon area. The monsoon domain is first identified using a ML model from a gridded precipitation dataset, and it is further delineated into subregions to better represent local precipitation characteristics. A linear orthogonal method is used to decompose the mean precipitation time series and projects it onto various modes. The monsoon ridge and moisture surges along the Gulf of California are present in the first modes, representing the seasonal-background of precipitation, whereas the second modes are more associated with shorter-time phenomena, such as upper-level disturbances and mid-troposphere lows. Feature-based analysis is conducted to reveal the meteorological causes for extreme precipitation. Five synoptic features and one mesoscale feature are examined and assigned as potential drivers for extreme precipitation. Finally, the feature-based analysis are linked to the linear modes, and moisture surges are found to be more connected with the first mode whereas tropical cyclones are more correlated with the second modes, particularly for extreme precipitation events.