Social media promotes social connectedness, but social media
users can still be lonely which is an important preceding condition
to various mental health disorders such as anxiety and
depression. Here we aim to describe online loneliness in individuals
from the linguistic and social features of their platform
use. We define a sample of Twitter users who explicitly report
being lonely and compare their language to a matching random
control sample. For each user, we create a text embedding - a
numerical representation of the content of their online posts,
excluding terms and expressions related to loneliness. We utilize
principal component analysis on the resulting embeddings
to condense the data into a smaller number of variables, while
still retaining the majority of the variance. By doing so, we
are able to position each user within a two-dimensional space,
defined by the first two principal components, which capture
the most significant amount of variation in the data. Lonely
individuals are spatially separated from the control sample, indicating
that lonely individuals exhibit distinct language patterns
that is often self-referential, e.g. “I should” and “but
I”. Indicators of online social relations, such as the number
of online friends, favorites, mentions, show that lonely individuals
have fewer social relations, while a sentiment analysis
demonstrates that their posts have lower valence. Our results
provide insights into the lexical, social, and affective markers
that characterize loneliness online, providing a starting point
for the development of diagnostics and prevention.