Common vowel inventories of languages tend to be better dispersed in the space of possible vowels than less common or unattested inventories. The present research explored the hypothesis that functional factors underlie this preference. Connectionist models were trained on different inventories of spoken vowels, taken from a naturalistic corpus. The first experiment showed that networks trained on well-dispersed five-vowel sets like [i e a o u] learned the inventory more quickly and generalized better to novel stimuli, compared to those trained on less dispersed vowel sets. Experiments 2-3 examined how effects due to ease of perception are modulated by factors related to production. Languages tend to prefer front vowel contrasts over back vowels because the latter tend to be produced with more variability. This caused networks trained on an [i e a u] inventory to perform better than those trained on [i a o u]. Thus both acoustic separation of vowels and variability in how they are realized in speech affect ease of learning and generalization. The results suggest that acoustic and articulatory factors can explain apparent phonological universals.