There is a growing need for methodologies that integrate machine learning with scientific domain expertise, enabling the construction of interpretable models while contextualizing rare phenomena within the broader scientific landscape. With the exponentially growing influx of publicly available Earth observation data, such methods can be deployed to help answer open questions in Earth's cryosphere, which will enable better constraints of future sea level rise in the coming century.
This research develops scientifically contextualized computer vision methods in low-shot learning regimes, helping to address critical questions at the ice-bed and ice-ocean interfaces. I focus on the classification of rare phenomena that are characterized by prehistoric and contemporary ice sheets. I develop a scientifically-driven filtering method to automate the detection of subglacial bedforms formed during the last glacial maximum, which were shaped by the dynamics of the ice sheets that once flowed above them. These bedforms compose ~2% of the overall training set, and the proposed prefiltering approach can be modularly inserted into exisiting pipelines, enabling the automatic detection of these bedforms from publically available digital elevation models with up to 94% accuracy.
I present a concise tiling strategy that I developed to prepare large satellite imagery for use with machine learning algorithm training on GPUs. Tiling is the industry standard, but previous methods that attempted to preserve semantic context created redundancies within the training data, altering the model outcomes. By uniquely permuting every extension of the dataset, I eliminate redundancies, and find that, with distinct transformations, I can extend the dataset further than previous approaches, and that this extension doesn't alter the structure of the training data. Applying this preprocessing step alone, without any changes to model architecture or optimization, improves the performance on underrepresented classes by up to 15.8%.
I use this method to prepare manually labeled imagery to search for persistent polynyas, which are a rare phenomena along the western coast of Antarctica. Polynyas are areas of open water within the sea ice, and when opened thermodynamically by warm water plumes that arise from under the marginal ice shelves, they can persist in the same location from year to year. They also offer a surface view into subsurface ice-ocean interactions that would be otherwise hard to monitor in satellite imagery. However their small size and relative rarity, means that these polynyas make up a fractional portion of the overall scene in satellite imagery and are easily confounded with other areas of open water which occur more frequently when off-the-shelf machine learning methods are applied. Due to their physics of their formation pathways, such polynyas typically occur right at the ice front interface, and I use this geometric constraint to build a geophysically contextualized objective into the loss function of existing object detection architectures, allowing for the rapid detection of the persistent polynya census in the Amundsen Sea embayment. I recover all eleven previously characterized polynyas in the region, and find eight new polynyas. While this approach was specified in our particular model to the geomorphometry that informs persistent polynya formation, such an approach is easly generalized to aid in the detection of any underrepresented target for which prior knowledge of semantic contextualizations governs the geometry of the scene.