22 Feb 03:02

echen102

2fe736f

Release v2.106 Latest

Latest

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 2/17/23.

Due to Twitter's changing policies around their free API, we are unsure of how this will impact academic access to the API. We will continue to collect tweets and update this repository for as long as we can.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.106)

Number of Tweets : 2,775,946,436

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,785,043,839	64.3%
Spanish	es	307,973,203	11.09%
Portuguese	pt	107,505,532	3.87%
French	fr	102,743,271	3.7%
Undefined	und	75,618,129	2.72%
Indonesian	in	74,180,508	2.67%
German	de	64,650,071	2.33%
Japanese	ja	41,290,208	1.49%
Thai	th	38,024,206	1.37%
Italian	it	31,850,251	1.15%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

07 Feb 03:08

echen102

v2.105

216cf6a

Release v2.105

This release contains Tweet IDs collected from 1/21/20 - 2/02/23.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.105)

Number of Tweets : 2,763,160,115

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,775,908,066	64.27%
Spanish	es	307,204,779	11.12%
Portuguese	pt	107,186,241	3.88%
French	fr	102,149,610	3.7%
Undefined	und	75,584,072	2.74%
Indonesian	in	74,104,930	2.68%
German	de	64,212,175	2.32%
Japanese	ja	40,909,809	1.48%
Thai	th	37,982,569	1.37%
Italian	it	31,722,962	1.15%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

12 Jan 00:42

echen102

v2.104

501aa9a

Release v2.104

This release contains Tweet IDs collected from 1/21/20 - 1/08/23.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.104)

Number of Tweets : 2,732,637,342

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,752,902,076	64.15%
Spanish	es	305,448,610	11.18%
Portuguese	pt	106,554,138	3.9%
French	fr	101,016,370	3.7%
Undefined	und	75,489,676	2.76%
Indonesian	in	73,955,333	2.71%
German	de	63,388,826	2.32%
Japanese	ja	40,055,653	1.47%
Thai	th	37,881,897	1.39%
Italian	it	31,471,016	1.15%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

25 Oct 23:32

echen102

v2.103

ef17406

Release v2.103

This release contains Tweet IDs collected from 1/21/20 - 10/21/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.103)

Number of Tweets : 2,654,580,812

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,700,006,924	64.04%
Spanish	es	299,612,346	11.29%
Portuguese	pt	103,284,694	3.89%
French	fr	97,252,313	3.66%
Undefined	und	75,135,906	2.83%
Indonesian	in	73,198,108	2.76%
German	de	60,877,156	2.29%
Japanese	ja	38,252,562	1.44%
Thai	th	37,308,222	1.41%
Italian	it	30,341,504	1.14%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

06 Oct 08:49

echen102

v2.102

50266f9

Release v2.102

This release contains Tweet IDs collected from 1/21/20 - 10/01/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.102)

Number of Tweets : 2,637,008,139

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,688,555,276	64.03%
Spanish	es	298,247,222	11.31%
Portuguese	pt	102,367,254	3.88%
French	fr	96,333,994	3.65%
Undefined	und	75,081,254	2.85%
Indonesian	in	73,039,656	2.77%
German	de	60,148,373	2.28%
Japanese	ja	37,791,171	1.43%
Thai	th	37,202,896	1.41%
Italian	it	30,084,544	1.14%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

29 Aug 20:50

echen102

v2.101

a19f838

Release v2.101

This release contains Tweet IDs collected from 1/21/20 - 08/26/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.101)

Number of Tweets : 2,604,429,273

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,667,013,053	64.01%
Spanish	es	295,395,806	11.34%
Portuguese	pt	101,393,782	3.89%
French	fr	94,792,160	3.64%
Undefined	und	74,944,569	2.88%
Indonesian	in	72,761,915	2.79%
German	de	58,829,135	2.26%
Thai	th	36,875,513	1.42%
Japanese	ja	36,871,265	1.42%
Italian	it	29,619,287	1.14%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

07 Jul 02:46

echen102

v2.100

d54dbee

Release v2.100

This release contains Tweet IDs collected from 1/21/20 - 06/29/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.100)

Number of Tweets : 2,535,301,242

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,621,697,412	63.96%
Spanish	es	289,295,548	11.41%
Portuguese	pt	99,823,836	3.94%
French	fr	91,019,573	3.59%
Undefined	und	74,644,957	2.94%
Indonesian	in	71,969,845	2.84%
German	de	56,040,996	2.21%
Thai	th	36,114,750	1.42%
Japanese	ja	34,954,934	1.38%
Italian	it	28,548,846	1.13%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

23 May 11:09

echen102

v2.99

6f60d1f

Release v2.99

This release contains Tweet IDs collected from 1/21/20 - 05/21/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.99)

Number of Tweets : 2,490,082,420

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,591,756,166	63.92%
Spanish	es	285,407,587	11.46%
Portuguese	pt	98,437,683	3.95%
French	fr	88,470,319	3.55%
Undefined	und	73,826,512	2.96%
Indonesian	in	71,370,738	2.87%
German	de	54,318,024	2.18%
Thai	th	35,636,980	1.43%
Japanese	ja	34,095,323	1.37%
Italian	it	27,984,965	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

15 May 08:14

echen102

v2.98

a0fed08

Release v2.98

This release contains Tweet IDs collected from 1/21/20 - 05/11/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.98)

Number of Tweets : 2,476,848,985

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,582,789,848	63.9%
Spanish	es	284,266,756	11.48%
Portuguese	pt	98,154,074	3.96%
French	fr	87,829,750	3.55%
Undefined	und	73,409,966	2.96%
Indonesian	in	71,201,101	2.87%
German	de	53,923,223	2.18%
Thai	th	35,565,734	1.44%
Japanese	ja	33,788,712	1.36%
Italian	it	27,821,357	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

15 May 08:12

echen102

v2.97

6ab25c8

Release v2.97

This release contains Tweet IDs collected from 1/21/20 - 04/30/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.97)

Number of Tweets : 2,460,886,441

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,571,079,689	63.84%
Spanish	es	283,089,973	11.5%
Portuguese	pt	97,810,049	3.97%
French	fr	86,984,963	3.53%
Undefined	und	72,918,746	2.96%
Indonesian	in	70,995,833	2.88%
German	de	53,431,666	2.17%
Thai	th	35,429,686	1.44%
Japanese	ja	33,510,573	1.36%
Italian	it	27,607,465	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.106

Data Usage Agreement / How to Cite

Statistics Summary (v2.106)

Known Gaps

Inquiries

Release v2.105

Data Usage Agreement / How to Cite

Statistics Summary (v2.105)

Known Gaps

Inquiries

Release v2.104

Data Usage Agreement / How to Cite

Statistics Summary (v2.104)

Known Gaps

Inquiries

Release v2.103

Data Usage Agreement / How to Cite

Statistics Summary (v2.103)

Known Gaps

Inquiries

Release v2.102

Data Usage Agreement / How to Cite

Statistics Summary (v2.102)

Known Gaps

Inquiries

Release v2.101

Data Usage Agreement / How to Cite

Statistics Summary (v2.101)

Known Gaps

Inquiries

Release v2.100

Data Usage Agreement / How to Cite

Statistics Summary (v2.100)

Known Gaps

Inquiries

Release v2.99

Data Usage Agreement / How to Cite

Statistics Summary (v2.99)

Known Gaps

Inquiries

Release v2.98

Data Usage Agreement / How to Cite

Statistics Summary (v2.98)

Known Gaps

Inquiries

Release v2.97

Data Usage Agreement / How to Cite

Statistics Summary (v2.97)

Known Gaps

Inquiries