Wildlife biologists often rely on classification models to automate the labelling process of audio data that they collect in the wild. Bird populations are often a subject of interest in audio data, and tens to hundreds of species can be tracked in population-dense areas such as the Amazon rainforest. However, these population-dense environments also pose the issue of heavy noise, distortion, and overlapping bird calls to be classified. By transforming audio signals into their spectrogram form in the frequency domain, we can identify keypoints in the spectrogram signal by passing a sliding window over the signal and finding the local maxima. Describing an audio signal by its keypoint-spectrogram can result in higher classification accuracy for overlapping bird calls when passed as input to a CNN than if the entirety of the audio spectrogram were to be used.