Skip to main content
eScholarship
Open Access Publications from the University of California

A Real-time Temporal Clustering Algorithm for short text, and its applications

  • Author(s): Agarwal, Nishant
  • Advisor(s): Mcauley, Julian John
  • et al.
Abstract

Real-world events of general interest trigger engaging discussions among people

for short bursts in time. These events then either evolve, merge or slowly lose public

interest as time goes by. With the advent of social media like Twitter, people have a

platform to voice opinions, share news and make announcements, some of which gain

mass popularity. Leveraging social media for early detection of these events, has thus

become a problem of immense practical value for news media companies and people

who want to be aware about the world in general.

In this thesis, we present the foundations of a system that runs on a real-time,

dynamic, temporal clustering algorithm, that is capable of detecting bursts in streaming

data, as they happen in time. By using efficient data structures, we not only detect these

bursty topics, but also store them efficiently, to facilitate many independent applications

such as hashtag recommendations and community detection, which would requires a

completely different approaches, if handled separately. An efficient storage mechanism

also enables us to perform evolutionary queries on the discovered themes, to get a greater

insight. We also extend this research on social media, to user behavioral analysis, using

their database access logs, to find bursty patterns, anomaly detection and related tasks.

Lastly, we perform an in-depth evaluation to show that our model outperforms many

popular approaches in terms of topic quality and hashtag precision by more than 15%.

Main Content
Current View