Skip to content

Introduction

forrestgithub edited this page Apr 26, 2013 · 2 revisions

##Motivation##

People would like to know what happened in one place during a past time period. E.g., http://www.history.com/this-day-in-history

Many research on real-time topic detection from tweets[1-3], i.e., report new topic on-line.

Twitter provides popular hashtags/words, but many are meaningless.

##System Objective## Input a period of time, output the top-k popular topics in that time period.

Extension: Add location constraint, i.e., return top-k popular topics about/happended in that place and time period.

##System Architecture##

##Detect topics from tweets## Clustering similar tweets into topics

It is likely many tweets do not belong to any topic, i.e.,outliers.

Adopt OPTICS clustering algorithm[4]

– Density based clustering algorithm

– Able to discard outliers

– Re-clustering is efficient

####Clustering details####

– Retweets are merged into one tweet, with its support as the total number of retweets.

– Create inverted index for merged tweets

– Apply OPTICS algorithm for tweets on one list at a time

– Each tweet is represented by a word vector, with tf*idf as ordinates.

– Distance between two tweets,t1,t2, is 1-cos(t1,t2)

##Online Browsing##

• Input: a set of dates

• Query processing:

– Load topics for all query dates

– Identify similar topics by OPTICS Clustering.

– Rank topics by its support, keep top K

– Output most 3 retweeted tweets for each topic

##Reference## [1]H. Kwak et al. What is Twitter, a Social Network or a News Media?, WWW 2010.

[2] T. Sakaki, et al. Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors, WWW 2010.

[3]Manoj K. Agarwal, Krithi Ramamritham, Manish Bhide: Real Time Discovery of Dense Clusters in Highly Dynamic Graphs PVLDB 2012

[4]http://en.wikipedia.org/wiki/OPTICS_algorithm

Clone this wiki locally