Certain concepts, words, and images are intuitively more sim-ilar than others (dog vs. cat, dog vs. spoon), though quantify-ing such similarity is notoriously difficult. Indeed, this kindof computation is likely a critical part of learning the categoryboundaries for words within a given language. Here, we usea set of 27 items (e.g. ‘dog’) that are highly common in in-fants’ input, and use both image- and word-based algorithmsto independently compute similarity among them. We findthree key results. First, the pairwise item similarities derivedwithin image-space and word-space are correlated, suggest-ing preserved structure among these extremely different rep-resentational formats. Second, the closest ‘neighbors’ for eachitem, within each space, showed significant overlap (e.g. bothfound ‘egg’ as a neighbor of ‘apple’). Third, items with themost overlapping neighbors are later-learned by infants andtoddlers. We conclude that this approach, which does not relyon human ratings of similarity, may nevertheless reflect stablewithin-class structure across these two spaces. We speculatethat such invariance might aid lexical acquisition, by servingas an informative marker of category boundaries.