Models of spoken word recognition vary in the ways in which they capture the relationship between speech input and meaning. Modular accounts prohibit a word's meaning from affecting the computation of its form-based representation, whereas interactive models allow semantic activation to affect phonological processing. To test these competing hypotheses we manipulated word familiarity and imageability, using auditory lexical decision and repetition tasks. Responses to high imageability words were significantly faster than to low imageability words. Response latencies were also analysed as a function of cohort variables; cohort size and frequency of cohort members. High and low imageable words were divided into 2 sets: (a) large cohorts with many high frequency competitors, (b) small cohorts with few high frequency competitors. Analyses showed that there was only a significant imageability effect for the words which were members of large cohorts. These data suggest that when the mapping from phonology to semantics is difficult (when a spoken word activates a large cohort consisting of many high frequency competitors), semantic information can help the discrimination process. Because highly imageable words are "semantically richer" and/or more context-independent, they provide more activation to phonology than do low imageability words. Thus, these data provide strong support for interactive models of spoken word recognition.