Last edited 6/17/2024
This repository contains replication materials for Thomas Davidson and Jenny Enos. Forthcoming. "Engaging Populism? The Popularity of European Populist Political Parties on Facebook and Twitter, 2010-2020." Political Communication doi: https://doi.org/10.1080/10584609.2024.2369118
Please get in touch via email if you have any questions.
Most analyses were performed on a Macbook Pro laptop using R and are compatible with the latest version as of publication (4.1.2). The main estimation procedures were executed using the Amarel high-performance computing cluster at Rutgers University, a Linux-based environment consisting of a large gluster of compute nodes that can be used to execute processes in parallel.
The following sections detail the organization of the data and processes for data cleaning and merging, the estimation of the models, and the construction of the results.
At a high-level, our analyses can be replicated by running the scripts to process and merge the data (cleaning-and-merging
), estimating the models (models
), then finally generating the figures and tables (results
). Since we are providing the full output from our model estimation, the final results can be reproduced simply by using these files to produce the figures and tables without needing to perform the intermediate steps.
The following describes the data sources used in the analyses. See the following section for instructions on running the scripts to generate the final analysis data. All data are stored in the data
folder.
The Facebook and Twitter posts were collected from the respective APIs. These APIs require authorization and the Twitter Academic API is no longer active, but example scripts to show how these APIs were queried are contained in data-collection
.
Tables containing raw social media data collected from the APIs are stored in facebook_data_raw
and twitter_data_raw
, respectively. The versions of these data used in the final country-year analyses are contained in facebook_data_cleaned
and twitter_data_cleaned
. The process for construct this data are described below.
The data on political parties are stored in three different places. populism_data
contains the populism measures from the PopuList, GPS, and POPPA. partyfacts_data
and parlgov_data
contain the PartyFacts and ParlGov datasets respectively.
There are two main sources of country-level data. country_data
contains a table on country covariates from the World Bank and another table with information on Northern Ireland from the UK Office of National statistics. The measure of social media use from the Eurobarometer is included in the eurobarometer_data
directory.
The analysis dataset is build by running several different components. The scripts are contained in the cleaning-and-merging
directory.
The raw social media data are combined into cleaned versions by running create_facebook_datasets.R
and create_twitter_datasets.R
. These scripts yield two versions of each dataset: an aggegated country-year dataset used for the main analyses and a post-level dataset used for a robustness check. The original data are not provided with this replication package but these scripts show the logic of the approach used to prepare the data.
The different country-level datasets are combined by running the script country-data-compilation.R
. This merges together the World Bank, Northern Ireland, and Eurobarometer datasets.
Next, the country-imputation.R
script can be run to load the output from the previous step and execute the procedure to impute missing observations for the Eurobarometer data. This script yields the final country dataset, data/country_data/final_imputed_country_data.csv
.
Once the yearly social media data and country-level data have been compiled, full_merge_script.R
can be run to merge together the party-level and country-level covariates. This script is the most complicated part of the process as it involves joining together several different sources. The output is stored as data/final_merged_data/final_merged_dataset.csv
.
Some additional cleaning steps are used to prepare the merged data for modeling. The final_data_cleaning.R
includes additional steps to filter the analysis dataset and construct additional variables. The outputs from this script are two datasets: /data/model_data/model_dataset_fb.csv
and /data/model_data/model_dataset_tw.csv
. There are also some supplementary datasets created for robustness checks (different versions of Irish data) and to provide information for visualization.
All models reported in the paper were estimated using the brms
R package and RStan. The analysis was performed using a SLURM high-performance compute cluster, which allows multiple models to be estimated simultaneously rather than in linear order, significantly reducing the compute time.
This infrastructure necessitates each model to be stored in a separate script, such that the SLURM controller can run separate scripts. As such, our models are organized in the following way:
The models
directory contains all code. The scripts
directory inside this contains subdirectories for different parts of the analysis.
POPULIST
, GPS
, and POPPA
include scripts for the main three outcomes and POPULIST_CATEGORICAL
models for the categorical measure from PopuList. In each case, the following convention is used to name the files.
Dataset + Platform + Outcome + Model. The following codes are used for models: fe
= Fixed effects, re
= Random intercepts, rs
= Random intercepts and slopes, rs_cy
= Random slopes and intercepts including at country-year (main specification). For example, populist_tw_retweets_rs_cy.R
runs the model to estimate the relationship between number of retweets and populism using the Populist measure.
The INTERACTIONS
subdirectory contains models to estimate interaction terms. In this case, the last part of the name indicates the interacting variable.
Each script does the following. First, the line source(load-data.R)
is run to load the final datasets into memory. This script also performs some adjustments to ensure that variables are correctly represented and some additional filtering. Next, the brm
function from the package brms
is used to estimate each model using Bayesian estimation. The model is then stored in a directory called model results.
Note: For our analyses, these models were run on a remote server and the results were then downloaded. The scripts in the replication package have been modified to ensure that results are stored in the correct place to render plots (results/output
and results/output_interactions
). This means that these models could be re-run locally rather than requiring a SLURM controller.
The final models are stored as .RData
objects output by brms
. These can be read into memory to extract various information used to produce the figures and tables. Our final fitted models are included with this replication package, so it is not necessary to re-estimate the models in order to reproduce our results. These models are too large to be stored on Github but can be downloaded using this Google Drive link. Please contact the lead author if the link does not work.
The results
folder contains scripts used to produce results in the paper as well as directories containing various figures and tables.
descriptive
contains a file descriptive_analysis.R
that reproduces Figure 1 and Figure 2.
The remaining figures are produced by running the following scripts in results/plots/
and all output is stored in results/plots/figures
.
main_coefficient_plot.R
reproduces Figure 3.
categorical_populism_coefficient.R
reproduces Figure 4.
random_slopes_main.R
reproduces Figure 5.
random_slopes_by_ideology
reproduces Figure 6.
country_stories_facet_populist.R
reproduces Figure 6.
interaction_coefficient.R
reproduces Figure 8.
These scripts can also be used to reproduce the various analyses reported in the Appendix.
The descriptive table is produced by tables/final_tables/descriptive_table.R
. Note that this table required some additional formatting to render correctly in the document, whereas other tables are stored as PDFs.
The full regression tables for our main results are reproduced by the script tables/final_tables/render_regression_tables.R
.
Additional figures for random slopes analysis for Twitter are produced by random_slopes_main_twitter.R
and random_slopes_by_ideology_twitter.R
. The figures are stored in the same place.
The figure for the post-level regression plot is reproduced by running plots/post_tweet_regression.R
. This will run the frequentist models and produce the plot.