We conducted a case study on how unreliable and/or unrepresentative stimuli in psycholinguistics research may impact the generalizability of experimental findings. Using the domain of lexical ambiguity as a foil, we analyzed 2033 unique words (6481 tokens) from 214 studies. Specifically, we examined how often studies agreed on the ambiguity types assigned to a word (i.e., homonymy, polysemy, and monosemy), and how well the words represented the populations underlying each ambiguity type. We observed far from perfect agreement in terms of how words are assigned to ambiguity types. We also observed that coverage of the populations is relatively poor and biased, leading to the use of a narrower set of words and associated properties. This raises concerns about the degree to which prior theoretical claims have strong empirical support, and offers targeted directions to improve research practices that are relevant to a broad set of domains.