26 Apr 02:25

echen102

e17e6b8

Release v2.96

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 04/23/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.96)

Number of Tweets : 2,452,149,122

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,565,632,525	63.85%
Spanish	es	282,489,183	11.52%
Portuguese	pt	97,620,393	3.98%
French	fr	86,557,340	3.53%
Undefined	und	72,692,522	2.96%
Indonesian	in	70,888,553	2.89%
German	de	53,178,276	2.17%
Thai	th	35,291,207	1.44%
Japanese	ja	33,377,645	1.36%
Italian	it	27,506,327	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

19 Apr 03:49

echen102

v2.95

3124dc4

Release v2.95

This release contains Tweet IDs collected from 1/21/20 - 04/15/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.95)

Number of Tweets : 2,441,061,762

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,557,905,401	63.82%
Spanish	es	281,655,360	11.54%
Portuguese	pt	97,425,881	3.99%
French	fr	86,040,812	3.52%
Undefined	und	72,359,106	2.96%
Indonesian	in	70,751,065	2.9%
German	de	52,854,074	2.17%
Thai	th	35,074,785	1.44%
Japanese	ja	33,216,863	1.36%
Italian	it	27,338,794	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

11 Apr 08:42

echen102

v2.94

7ed8f87

Release v2.94

This release contains Tweet IDs collected from 1/21/20 - 04/08/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.94)

Number of Tweets : 2,430,598,870

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,550,661,291	63.8%
Spanish	es	280,902,032	11.56%
Portuguese	pt	97,233,479	4.0%
French	fr	85,551,860	3.52%
Undefined	und	72,080,998	2.97%
Indonesian	in	70,624,961	2.91%
German	de	52,493,416	2.16%
Thai	th	34,827,345	1.43%
Japanese	ja	33,048,663	1.36%
Italian	it	27,192,174	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

04 Apr 09:36

echen102

v2.93

567b0c0

Release v2.93

This release contains Tweet IDs collected from 1/21/20 - 04/02/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.93)

Number of Tweets : 2,422,918,491

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,545,449,999	63.78%
Spanish	es	280,291,494	11.57%
Portuguese	pt	97,098,499	4.01%
French	fr	85,200,656	3.52%
Undefined	und	71,870,920	2.97%
Indonesian	in	70,502,598	2.91%
German	de	52,089,569	2.15%
Thai	th	34,738,108	1.43%
Japanese	ja	32,915,579	1.36%
Italian	it	27,093,405	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

28 Mar 22:24

echen102

v2.92

11feb7b

Release v2.92

This release contains Tweet IDs collected from 1/21/20 - 03/25/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.92)

Number of Tweets : 2,411,660,389

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,537,896,848	63.77%
Spanish	es	279,374,852	11.58%
Portuguese	pt	96,894,387	4.02%
French	fr	84,679,670	3.51%
Undefined	und	71,483,038	2.96%
Indonesian	in	70,338,149	2.92%
German	de	51,579,752	2.14%
Thai	th	34,590,631	1.43%
Japanese	ja	32,691,282	1.36%
Italian	it	26,932,124	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

23 Mar 21:34

echen102

v2.91

9509252

Release v2.91

This release contains Tweet IDs collected from 1/21/20 - 03/21/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.91)

Number of Tweets : 2,405,567,090

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,533,813,620	63.76%
Spanish	es	278,924,071	11.59%
Portuguese	pt	96,785,133	4.02%
French	fr	84,395,800	3.51%
Undefined	und	71,294,228	2.96%
Indonesian	in	70,247,046	2.92%
German	de	51,308,155	2.13%
Thai	th	34,431,911	1.43%
Japanese	ja	32,601,879	1.36%
Italian	it	26,848,686	1.12%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

15 Mar 10:55

echen102

v2.90

cdd6fcd

Release v2.90

This release contains Tweet IDs collected from 1/21/20 - 03/12/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.90)

Number of Tweets : 2,391,456,124

Language breakdown of top 15 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,524,527,323	63.75%
Spanish	es	277,777,459	11.62%
Portuguese	pt	96,427,956	4.03%
French	fr	83,754,184	3.5%
Undefined	und	70,893,598	2.96%
Indonesian	in	69,989,049	2.93%
German	de	50,590,705	2.12%
Thai	th	34,228,830	1.43%
Japanese	ja	32,363,526	1.35%
Italian	it	26,628,972	1.11%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

07 Mar 23:37

echen102

v2.89

47e0136

Release v2.89

This release contains Tweet IDs collected from 1/21/20 - 03/04/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.89)

Number of Tweets : 2,379,024,026

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,516,323,151	63.74%
Spanish	es	276,770,942	11.63%
Portuguese	pt	96,150,179	4.04%
French	fr	83,170,496	3.5%
Undefined	und	70,525,268	2.96%
Indonesian	in	69,722,278	2.93%
German	de	50,032,940	2.1%
Thai	th	34,057,302	1.43%
Japanese	ja	32,124,469	1.35%
Italian	it	26,440,922	1.11%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

01 Mar 00:01

echen102

v2.88

cd22c96

Release v2.88

This release contains Tweet IDs collected from 1/21/20 - 02/26/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.88)

Number of Tweets : 2,368,519,063

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,509,244,530	63.72%
Spanish	es	275,946,975	11.65%
Portuguese	pt	95,867,045	4.05%
French	fr	82,661,391	3.49%
Undefined	und	70,222,541	2.96%
Indonesian	in	69,497,352	2.93%
German	de	49,700,092	2.1%
Thai	th	33,958,198	1.43%
Japanese	ja	31,946,715	1.35%
Italian	it	26,291,299	1.11%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

21 Feb 21:51

echen102

v2.87

810e04e

Release v2.87

This release contains Tweet IDs collected from 1/21/20 - 02/18/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.87)

Number of Tweets : 2,352,799,019

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,499,025,705	63.71%
Spanish	es	274,605,185	11.67%
Portuguese	pt	95,438,545	4.06%
French	fr	81,900,820	3.48%
Undefined	und	69,758,279	2.96%
Indonesian	in	69,114,298	2.94%
German	de	49,188,180	2.09%
Thai	th	33,658,172	1.43%
Japanese	ja	31,631,310	1.34%
Italian	it	26,050,682	1.11%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.96

Data Usage Agreement / How to Cite

Statistics Summary (v2.96)

Known Gaps

Inquiries

Release v2.95

Data Usage Agreement / How to Cite

Statistics Summary (v2.95)

Known Gaps

Inquiries

Release v2.94

Data Usage Agreement / How to Cite

Statistics Summary (v2.94)

Known Gaps

Inquiries

Release v2.93

Data Usage Agreement / How to Cite

Statistics Summary (v2.93)

Known Gaps

Inquiries

Release v2.92

Data Usage Agreement / How to Cite

Statistics Summary (v2.92)

Known Gaps

Inquiries

Release v2.91

Data Usage Agreement / How to Cite

Statistics Summary (v2.91)

Known Gaps

Inquiries

Release v2.90

Data Usage Agreement / How to Cite

Statistics Summary (v2.90)

Known Gaps

Inquiries

Release v2.89

Data Usage Agreement / How to Cite

Statistics Summary (v2.89)

Known Gaps

Inquiries

Release v2.88

Data Usage Agreement / How to Cite

Statistics Summary (v2.88)

Known Gaps

Inquiries

Release v2.87

Data Usage Agreement / How to Cite

Statistics Summary (v2.87)

Known Gaps

Inquiries