A number of recent models of semantics combine linguistic
information, derived from text corpora, and visual information,
derived from image collections, demonstrating that the
resulting multimodal models are better than either of their
unimodal counterparts, in accounting for behavioural data.
However, first, while linguistic models have been extensively
tested for their fit to behavioural semantic ratings, this is not
the case for visual models which are also far more limited in
their coverage. More broadly, empirical work on semantic
processing has shown that emotion also plays an important role
especially for abstract concepts, however, models integrating
emotion along with linguistic and visual information are
lacking. Here, we first improve on visual representations by
choosing a visual model that best fit semantic data and
extending its coverage. Crucially then, we assess whether
adding affective representations (obtained from a neural
network model designed to predict emojis from co-occurring
text) improves model’s ability to fit semantic
similarity/relatedness judgements from a purely linguistic and
linguistic-visual model. We find that adding both visual and
affective representations improve performance, with visual
representations providing an improvement especially for more
concrete words and affective representations improving
especially fit for more abstract words.