An Exploration of Automated Methods for the Efficient Acquisition of Training Data for Acoustic Species Identification
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

An Exploration of Automated Methods for the Efficient Acquisition of Training Data for Acoustic Species Identification

Abstract

Passive acoustic monitoring is a field that involves collecting audio recordings fromnatural environments with the aim of measuring the health of ecosystems around the world. A common way to measure the health of an ecosystem is to quantify biodiversity via acoustic species identification. The rapid growth of this field is enabled by microcontroller hardware that help researchers to gather large audio datasets. These datasets are often big enough to make exhaustive human verification of species vocalizations infeasible. To which, researchers hope to leverage deep learning models trained for the task of acoustic species identification to efficiently parse these datasets. To build robust deep learning models, one must have access to a large amount of training data. Unfortunately, the availability of audio data with species vocalizations is a bottleneck to training such models. To acquire training data, researchers must develop methods to efficiently extract species vocalizations from their passive recordings that minimize human labor. Alternatively, they may leverage purposeful audio data that is available to the scientific community where species have been identified in a clip, but require extra processing to identify where the relevant vocalizations occur for training. We explore methods to efficiently verify template matching with bird species of interest on passive recordings to acquire training data and find the inclusion of a statistical learning ensemble to be the most effective. To identify relevant bird vocalizations from weakly labeled, purposeful recordings, we compare various sound event detection methods and find that binary classifiers best emulate human strong labels.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View