Skip to content

A curated list of reproducible research case studies, projects, tutorials, and media

Notifications You must be signed in to change notification settings

wariobrega/awesome-reproducible-research

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Reproducible Research Awesome

A curated list of reproducible research case studies, projects, tutorials, and media

Contents

Case studies

The term "case studies" is used here in a general sense to describe any study of reproducibility. A reproduction is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A refactor involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A replication involves generating new data and applying existing methods to achieve comparable results. A robustness test applies various statistical models or parameters to a given data set to study their effect on results. A census is a high-level tabulation conducted by a third party. A survey is a questionnaire sent to practitioners. A case narrative is an in-depth first-person account. A theoretical case study measures global reproducibility using non-empirical evidence. An independent discussion utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.

Study

Field

Approach

Size

Ioannidis 2005

Science

Theoretical

(all studies)

Glasziou et al 2008

Medicine

Census

80 studies

Baggerly & Coombes 2009

Cancer biology

Refactor

8 studies

Hothorn et al. 2009

Biostatistics

Census

56 studies

Ioannidis et al 2009

Genetics

Reproduction

18 studies

Anda et al 2009

Software engineering

Replication

4 companies

Vandewalle et al 2009

Signal processing

Census

134 papers

Prinz 2011

Biomedical sciences

Survey

23 PIs

Horthorn & Leisch 2011

Bioinformatics

Census

100 studies

Begley & Ellis 2012

Cancer biology

Replication

53 studies

Collberg et al 2014
Collberg & Proebsting 2016

Computer science

Census

613 papers

OSC 2015

Psychology

Replication

100 studies

Bandrowski et al 2015

Biomedical sciences

Census

100 papers

Patel et al 2015

Epidemiology

Robustness test

417 variables

Névéol et al 2016

NLP

Replication

3 studies

Reproducibility Project 2017

Cancer biology

Replication

9 studies

Vasilevsky et al 2017

Biomedical sciences

Census

318 journals

Kitzes et al 2017

Science

Case narrative

31 PIs

Barone et al 2017

Biological sciences

Survey

704 PIs

Kim & Dumas 2017

Bioinformatics

Refactor

1 study

Camerer 2017

Economics

Replication

18 studies

Olorisade 2017

Machine learning

Census

30 studies

Strupler & Wilkinson 2017

Archaeology

Case narrative

1 survey

Danchev et al 2017

Comparative toxicogenomics

Census

51,292 claims in 3,363 papers

Kjensmo & Gundersen 2017

Artificial intelligence

Census

400 papers

Gertler et al 2018

Economics

Census

203 papers

Stodden et al 2018

Computational science

Reproduction

204 articles, 180 authors

Madduri et al 2018

Genomics

Case narrative

1 study

Camerer et al 2018

Social sciences

Replication

21 papers

Silberzahn et al 2018

Psychology

Robustness test

One data set, 29 analyst teams

Boulesteix et al 2018

Medicine and health sciences

Census

30 papers

Eaton et al 2018

Microbiome immuno oncology

Replication

1 paper

Vaquero-Garcia et al 2018

Bioinformatics

Refactor and test of robustness

1 paper

Wallach et al 2018

Biomedical Sciences

Census

149 papers

Miller et al 2018

Bioinformatics

Synthetic replication & refactor

1 paper

Konkol et al 2018

Geosciences

Survey, Reproduction

146 scientists, 41 papers

Rahtz 2018

Reinforcement Learning

Reproduction, case narrative

1 paper

AlNoamany & Borghi 2018

Science & Engineering

Survey

215 participants

Li et al 2018

Nephrology

Robustness test

1 paper

Chen 2018

Social sciences & other

Census

810 Dataverse studies

Stagge et al 2019

Geosciences

Survey

360 papers

Bizzego et al 2019

Deep learning

Robustness test

1 analysis

Madduri et al 2019

Genomics

Case narrative

1 analysis

Mammoliti et al 2019

Pharmacogenomics

Case narrative

2 analyses

Allen & Mehler 2019

Biomedical sciences and Psychology

Census

127 registered reports

Pimentel et al 2019

All

Census

1,159,166 Jupyter notebooks

Vlisides et al 2019
Sieber et al 2019

Anaesthesia

Indepedent discussion

1 study

Bakker et al 2019

Psychology

Replication

1 paper

Dacrema et al 2019

Machine learning

Reproduction

18 conference papers

Eran et al 2019

Experimental archaeology

Replication

1 theory

Rauh et al 2019

Neurology

Census

202 papers

Sætrevik & Sjåstad 2019

Psychology

Replication

2 experiments

Ad-hoc reproductions

These are one-off unpublished attempts to reproduce individual studies

Reproduction

Original study

https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/ and https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/

Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17.

https://github.com/sean-harrison-bristol/CCR5_replication

Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6.

Courses

Development Resources

User tools

  • Open With Binder for Chrome or Firefox - open the GitHub repository you are visiting using MyBinder.org
  • DVC - DVC tracks machine learning models and data sets

Books

Data Repositories

All these repositories assign Digital Object Identifiers (DOIs) to data

  • DataCite - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
  • Data Dryad - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
  • Figshare - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
  • OSF - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
  • Zenodo - Allows embargoed, restricted access, metadata support. 50GB limit.

Examples and Exemplars

Haibe-Kains lab reproducible papers

Publication CodeOcean link
Mer AS et al. Integrative Pharmacogenomics Analysis of Patient Derived Xenografts codeocean.com/capsule/056639
Gendoo, Zon et al. MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature codeocean.com/capsule/643863
Yao et al. Tissue specificity of in vitro drug sensitivity codeocean.com/capsule/550275
Safikhani Z et al. Gene isoforms as expression-based biomarkers predictive of drug response in vitro codeocean.com/capsule/000290
El-Hachem et al. Integrative cancer pharmacogenomics to infer large-scale drug taxonomy codeocean.com/capsule/425224
Safikhani Z et al. Revisiting inconsistency in large pharmacogenomic studies codeocean.com/capsule/627606
Sandhu V et al. Meta-analysis of 1,200 transcriptomic profiles identifies a prognostic model for pancreatic ductal adenocarcinoma codeocean.com/capsule/269362

Journals

  • ReScience - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
  • ReplicationWiki - Replication in the social sciences, particularly economics

Ontologies

Organizations

Awesome Lists

Contribute

Contributions welcome! Read the contribution guidelines first.

License

CC0

To the extent possible under law, Jeremy Leipzig has waived all copyright and related or neighboring rights to this work.

About

A curated list of reproducible research case studies, projects, tutorials, and media

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published