Since scientists began to study the ocean, they have had to develop new observational methods. Modern marine biologists and ecologists have increasingly used imaging systems to help address their most persistent and vexing questions. These techniques have allowed researchers to take samples at higher spatial or temporal frequency than ever before. While yielding marvelous new insight into difficult to observe phenomena, image-based sampling creates a new data problem: there is too much of it. Without techniques to classify and analyze oceanographic images, much of the data sits idle.
Two instruments developed in the Jaffe Laboratory for Underwater Imaging are em- blematic of this trend. The Sub Sea Holodeck (SSH) is an immersive virtual aquarium devel- oped to study cephalopod camouflage in the laboratory. The Scripps Plankton Camera System (SPCS) is a pair of in situ microscopes built to observe undisturbed plankton populations over long periods of time. In this thesis, new techniques, grounded in current machine learning methodologies, are developed to speed the analysis of these information rich big data sets.
Data from the SSH are treated as textures and classified using texton dictionaries. In addition to being highly accurate, the texton-based method is entirely data driven – the metric for separating it is derived directly from the images. As a consequence, a new criterion for classifying cephalopods is proposed that explicitly treats samples that do not conform to the prescribed three class system.
Images from the SPCS and several other in situ plankton imaging systems are used to evaluate the performance of new deep learning methods. Several Convolutional Neural Net- works (CNNs) are trained from the data and tested against each other. The results underscore the startling representational power of CNNs and suggest that neural methods outperform other approaches for annotating plankton images.
Finally, an automated classifier is deployed on SPCS data to explore the dynamics of a host-parasite relationship. A CNN is used to build a high-resolution time series tracking fluc- tuations in Oithona similis and the parasite Paradinium sp. The subsequent analysis identifies a possible time scale of the parasite’s internal life stage and its effect on the overall O. similis population.