Modeling text document similarity is an important yet
challenging task. Even the most advanced computational
linguistic models often misjudge document similarity relative
to humans. Regarding the pattern of misjudgment between
models and humans, Lee and colleagues (2005) suggested that
the models’ primary failure is occasional underestimation of
strong similarity between documents. According to this
suggestion, there should be more extreme misses (i.e., models
failing to pick up on strong document similarity) than extreme
false positives (i.e., models falsely detecting document
similarity that does not exist). We tested this claim by
comparing document similarity ratings generated by humans
and latent semantic analysis (LSA). Notably, we implemented
LSA with 441 unique parameter settings, determined optimal
parameters that yielded high correlations with human ratings,
and finally identified misses and false positives under the
optimal parameter settings. The results showed that, as Lee et
al. predicted, large errors were predominantly misses rather
than false positives. Potential causes of the misses and false
positives are discussed.