Skip to content

Data and code for OS 2024 paper

License

Notifications You must be signed in to change notification settings

piazzai/os-ms-21-15751

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

os-ms-21-15751

This repository includes data and commented R code for the analysis reported in this paper. The analysis is based on the following publicly available data:

  1. Discogs. Data dumps are available from discogs.com. This analysis used the dump from April 1, 2020. Download it and extract it to a folder called discogs/ within your working directory.
  2. MusicBrainz. Data dumps are available from metabrainz.org. This analysis used the MusicBrainz PostgreSQL dump from April 11, 2020. Download it and extract it to a folder called musicbrainz/. The database schema is explained here.
  3. ListenBrainz. Data dumps are available from musicbrainz.org. This analysis used the ListenBrainz PostgreSQL dump from December 1, 2020. Download it and extract it to a folder called listenbrainz/.
  4. Additional data files. These are distributed as part of this repository, in the extra.tar.xz tarball. The file is hosted on Git LFS and can be downloaded from there, but in case that does not work it is also available from Dropbox through this link. Download it either way and extract it to a folder called extra/. The file abstamps.csv in this tarball contains timestamps for AcousticBrainz submissions kindly provided by AcousticBrainz developers. The file checktracks.csv includes details and links for tracks we manually added to AcousticBrainz as part of our analysis. These tracks are now permanently part of the AcousticBrainz database.

After all data has been downloaded, update lines 36–39 of scripts/preparation.R with paths to the new folders. Running the code provided in this script will replicate our data preparation. This includes scraping operations (using RSelenium) that require a Discogs account and take many days to complete. To reproduce the scraping, please provide your own Discogs username and password in line 44 of the script.

The data obtained through the preparation script is saved to an RData object called checkpoint.RData. This object is loaded at the beginning of scripts/analysis.R. The code provided here replicates all results reported in the paper, including descriptive statistics, regression estimates, and simulations. For convenience, checkpoint.RData is distributed as part of this repository.

The data used for regressions is separately distributed in CSV format and can be found in the datasets/ folder. There are two files: styles.csv contains the data used for our main analysis, which is based on Discogs styles; genres.csv contains the data used for our replication of the main analysis at level of Discogs genres (see the paper's online appendix).

Citation

Piazzai, Michele, Min Liu, and Martina Montauti (2024). Cognitive economy and product categorization. Organization Science, in press.
https://doi.org/10.1287/orsc.2021.15751