-
Notifications
You must be signed in to change notification settings - Fork 255
End to End Example Twitter Study
[DRAFT]
As a motivating example, we will gather tweets about a breaking news trend. We will use both the Search API for older data and Streaming API for new Tweets, and process the results to extract extra metadata about tweets, perform some analysis on the data, prepare the dataset for publication, and publish the results.
https://github.com/igorbrigadir/twitter-advanced-search
Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls
https://arxiv.org/abs/1403.7400
https://firstdraftnews.org/latest/sources-and-keywords-the-fundamentals-of-online-newsgathering/
https://developer.twitter.com/en/docs/tutorials/building-high-quality-filters
https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
https://developer.twitter.com/en/docs/twitter-api/tweets/filtered-stream/integrate/build-a-rule
A short list of command lines to help others start with twarc2 in Linux. Complete documentation at https://twarc-project.readthedocs.io/en/latest/
In my case I use pyenv to manage the python version (3.8.1).
Install or upgrade to the last version:
pip install --upgrade twarc
Simple search for "blacklivesmatter":
twarc2 search blacklivesmatter > search.jsonl
A complex search in the full archive (requires Academic Research track). This search looks for tweets with one of these two URL and a word:
twarc2 search 'url:"https://www.elconfidencial.com/espana/madrid/2021-09-07/universidad-periodismo-complutense-profesores_3218500" OR url:"https://www.infolibre.es/noticias/opinion/columnas/2021/09/08/la_verdad_sobre_caso_quiros_una_cronica_primera_persona_124235_1023.html" OR miguelenlared' --start-time 2021-09-07T00:00:01 --archive > 210907-21_2url_and_miguelenlared_con-y-sin-arroba.json
If you just want to count the number of tweets by day
twarc2 counts --csv --granularity day > blacklivesmatter_count.csv
This means to put each tweet in a line, instead of a whole json structure:
twarc2 flatten search.jsonl search_flatten.jsonl
twarc2 csv search_flatten.jsonl search_flatten.csv
A list of tweets can be displayed as a network of nodes (users) that are linked (tweets) by their interactions (RT, quote, mention, reply):
twarc2 network search_flatten.jsonl --format gexf search_flatten.gexf
Then you need to open the .gexf
in Gephi.