Passive acoustic monitoring is a field that involves collecting audio recordings fromnatural environments with the aim of measuring the health of ecosystems around the world. A
common way to measure the health of an ecosystem is to quantify biodiversity via acoustic
species identification. The rapid growth of this field is enabled by microcontroller hardware that
help researchers to gather large audio datasets. These datasets are often big enough to make
exhaustive human verification of species vocalizations infeasible. To which, researchers hope to
leverage deep learning models trained for the task of acoustic species identification to efficiently
parse these datasets. To build robust deep learning models, one must have access to a large
amount of training data. Unfortunately, the availability of audio data with species vocalizations is
a bottleneck to training such models. To acquire training data, researchers must develop methods
to efficiently extract species vocalizations from their passive recordings that minimize human
labor. Alternatively, they may leverage purposeful audio data that is available to the scientific
community where species have been identified in a clip, but require extra processing to identify
where the relevant vocalizations occur for training. We explore methods to efficiently verify
template matching with bird species of interest on passive recordings to acquire training data and
find the inclusion of a statistical learning ensemble to be the most effective. To identify relevant
bird vocalizations from weakly labeled, purposeful recordings, we compare various sound event
detection methods and find that binary classifiers best emulate human strong labels.