Skip to content

Release v2.106

Latest
Compare
Choose a tag to compare
@echen102 echen102 released this 22 Feb 03:02

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 2/17/23.

Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.106)

Number of Tweets : 2,775,946,436

Language breakdown of top 10 most prevalent languages :

Language ISO No. tweets % total Tweets
English en 1,785,043,839 64.3%
Spanish es 307,973,203 11.09%
Portuguese pt 107,505,532 3.87%
French fr 102,743,271 3.7%
Undefined und 75,618,129 2.72%
Indonesian in 74,180,508 2.67%
German de 64,650,071 2.33%
Japanese ja 41,290,208 1.49%
Thai th 38,024,206 1.37%
Italian it 31,850,251 1.15%

Known Gaps

Date Time
2/1/2020 4:00 - 9:00 UTC
2/8/2020 6:00 - 7:00 UTC
2/22/2020 21:00 - 24:00 UTC
2/23/2020 0:00 - 24:00 UTC
2/24/2020 0:00 - 4:00 UTC
2/25/2020 0:00 - 3:00 UTC
3/2/2020 Intermittent Internet Connectivity Issues
5/14/2020 7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.