Across the world's languages, children reliably learn nouns more easily than verbs. Attempts to understand the difficulty of verb learning have focused on determining whether the challenge stems from differences in the linguistic usage of nouns and verbs, or instead conceptual differences in the categories that they label. We introduce a novel metric to quantify the contributions of both sources of difficulty using unsupervised learning models trained on corpora of language and images. We find that there is less alignment between the linguistic usage of verbs and their categories than for nouns and their categories. However, this difference is driven almost entirely by differences in the structure of their visual categories: Relative to nouns, events described by the same verb are more variable and events described by two different verbs are more similar. We conclude that differences between noun and verb learning need not be due to fundamental differences in learning processes, but may instead be driven by the difficulty of one-shot generalization from verbs' visual categories.