-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction
##Motivation##
People would like to know what happened in one place during a past time period. E.g., http://www.history.com/this-day-in-history
Many research on real-time topic detection from tweets[1-3], i.e., report new topic on-line.
Twitter provides popular hashtags/words, but many are meaningless.
##System Objective## Input a period of time, output the top-k popular topics in that time period.
Extension: Add location constraint, i.e., return top-k popular topics about/happended in that place and time period.
##System Architecture##
##Detect topics from tweets## Clustering similar tweets into topics
It is likely many tweets do not belong to any topic, i.e.,outliers.
Adopt OPTICS clustering algorithm[4]
– Density based clustering algorithm
– Able to discard outliers
– Re-clustering is efficient
####Clustering details####
– Retweets are merged into one tweet, with its support as the total number of retweets.
– Create inverted index for merged tweets
– Apply OPTICS algorithm for tweets on one list at a time
– Each tweet is represented by a word vector, with tf*idf as ordinates.
– Distance between two tweets,t1,t2, is 1-cos(t1,t2)
##Online Browsing##
• Input: a set of dates
• Query processing:
– Load topics for all query dates
– Identify similar topics by OPTICS Clustering.
– Rank topics by its support, keep top K
– Output most 3 retweeted tweets for each topic
##Reference## [1]H. Kwak et al. What is Twitter, a Social Network or a News Media?, WWW 2010.
[2] T. Sakaki, et al. Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors, WWW 2010.
[3]Manoj K. Agarwal, Krithi Ramamritham, Manish Bhide: Real Time Discovery of Dense Clusters in Highly Dynamic Graphs PVLDB 2012