Humans communicate emotion vocally by modulating acoustic cues such as pitch, intensity and voice quality. Research has documented how the relative presence or absence of such cues alters the likelihood of perceiving an emotion, but the neural underpinnings of acoustic cue-dependent emotion perception remain obscure. Using functional magnetic resonance imaging in 20 subjects we examined a reciprocal circuit consisting of superior temporal cortex, amygdala and inferior frontal gyrus that may underlie affective prosodic comprehension. Results showed that increased saliency of emotion-specific acoustic cues was associated with increased activation in superior temporal cortex [planum temporale (PT), posterior superior temporal gyrus (pSTG), and posterior superior middle gyrus (pMTG)] and amygdala, whereas decreased saliency of acoustic cues was associated with increased inferior frontal activity and temporo-frontal connectivity. These results suggest that sensory-integrative processing is facilitated when the acoustic signal is rich in affective information, yielding increased activation in temporal cortex and amygdala. Conversely, when the acoustic signal is ambiguous, greater evaluative processes are recruited, increasing activation in inferior frontal gyrus (IFG) and IFG STG connectivity. Auditory regions may thus integrate acoustic information with amygdala input to form emotion-specific representations, which are evaluated within inferior frontal regions.