Skip to content

Latest commit

 

History

History
55 lines (45 loc) · 3.82 KB

README.md

File metadata and controls

55 lines (45 loc) · 3.82 KB

crowdsensing_datasets

Comparison of datasets:

Name Purpose Start Date End Date Number of examples Number of crowd sensors (for crowd level regression) Number of unique users Number of events (for event-detection) Number of non-events (for event-detection)
Sensor Counts (Melbourne) Source 2009-05-01 2020-04-30 3132346 66 - - -
Tweets (Melbourne) Source 2010-09-12 2018-06-04 266931 - 20176 - -
Tweets (max 100 duplicate coordinates) Crowd Level Regression 2010-09-21 2018-02-02 24638 42 5659 - -
Tweets (max 10 duplicate coordinates) Crowd Level Regression 2010-09-21 2018-02-02 12801 42 3785 - -
Flickr (near to Town Hall (West)) Crowd Level Regression 2010-07-08 2020-03-29 6076 - 852 - -
Tweets (is _event tagged) Event detection 2014-01-12 2018-02-12 1393561 - 77642 22241 1371320

A brief overview of how these datasets were derived and how they are categorized:

Tweets for regression

  • sources:
    • userVisits-Melb-tweets-280518.csv (original tweet dataset)
  • sorted by date, cleaned:
    • tweets_unfiltered.csv
  • filtered for duplicate coordinates, and tagged to nearest counting sensors:
    • tweets_max10_duplicate_coords.csv
    • tweets_max100_duplicate_coords.csv

Pedestrian Counts and Sensors

  • sources:
    • Pedestrian_Counting_System__2009_to_Present_counts_per_hour_.csv (from melb gov)
    • Pedestrian_Counting_System_-_Sensor_Locations.csv (from melb gov)
  • sensor locations, cleaned:
    • sensor_locations.csv
  • pedestrian counts tagged to corresponding sensors:
    • counts_withsensors.csv

Flickr

  • sources: flickr crawler
  • crawled for flickr posts within 100m of the counting sensor at Melbourne Town Hall (West) [-37.81487988, 144.9660878]
    • flickr_near_townhallwest

Event Detection

  • sources:
    • Event_permits_2014-2018_including_film_shoots_photoshoots_weddings_Christmas_parties_ promotions__fun_runs_and_public_events.csv (from melb gov)
    • getOldTweets crawler
    • tweets_unfiltered.csv
  • cleaned event permits dataset, filtered for public events, and geotagged to POIs (if available)
    • events_geo_tagged_public.csv
  • used getOldTweets to crawl for tweets within 100m of each event in the above dataset:
    • tweets_withevent.csv
  • merged with tweets_unfiltered.csv (superset of all melb tweets) to label tweets as is_event True or False