A Real-time Temporal Clustering Algorithm for short text, and its applications
Real-world events of general interest trigger engaging discussions among people
for short bursts in time. These events then either evolve, merge or slowly lose public
interest as time goes by. With the advent of social media like Twitter, people have a
platform to voice opinions, share news and make announcements, some of which gain
mass popularity. Leveraging social media for early detection of these events, has thus
become a problem of immense practical value for news media companies and people
who want to be aware about the world in general.
In this thesis, we present the foundations of a system that runs on a real-time,
dynamic, temporal clustering algorithm, that is capable of detecting bursts in streaming
data, as they happen in time. By using efficient data structures, we not only detect these
bursty topics, but also store them efficiently, to facilitate many independent applications
such as hashtag recommendations and community detection, which would requires a
completely different approaches, if handled separately. An efficient storage mechanism
also enables us to perform evolutionary queries on the discovered themes, to get a greater
insight. We also extend this research on social media, to user behavioral analysis, using
their database access logs, to find bursty patterns, anomaly detection and related tasks.
Lastly, we perform an in-depth evaluation to show that our model outperforms many
popular approaches in terms of topic quality and hashtag precision by more than 15%.