AlexsLemonade · jaclyn-taroni · Dec 18, 2019 · Dec 17, 2019 · Dec 17, 2019 · Dec 17, 2019
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -44,6 +44,7 @@
 
 - [ ] The dependencies required to run the code in this pull request have been added to the project Dockerfile.
 - [ ] This analysis has been added to continuous integration.
+- [ ] This analysis is recorded in the table in `analyses/README.md`.
 
 <!-- IF YOUR PULL REQUEST IS A DATA RELEASE, PLEASE REMOVE THE [HTML COMMENT TAG](https://html.com/tags/comment-tag/) FROM THE SECTION BELOW AND COMPLETE THE CHECKLIST-->
 

diff --git a/README.md b/README.md
@@ -22,6 +22,7 @@ The project maintainers include scientists from [Alex's Lemonade Stand Foundatio
     - [Pull Request Model](#pull-request-model)
 - [How to Add an Analysis](#how-to-add-an-analysis)
   - [Folder Structure](#folder-structure)
+  - [Documenting Your Analysis](#documenting-your-analysis)
   - [Analysis Script Numbering](#analysis-script-numbering)
   - [Output Expectations](#output-expectations)
   - [Docker Image](#docker-image)
@@ -141,7 +142,6 @@ Files that are primarily tabular results files should be placed in a `results` s
 Intermediate files that are useful within the processing steps but that do not represent final results should be placed in `../../scratch/`.
 It is safe to assume that files placed in `../../scratch` will be available to all analyses within the same folder.
 It is not safe to assume that files placed in `../../scratch` will be available from analyses in a different folder.
-When an analysis module contains multiple steps or is nearing completion, add a `README.md` file that summarizes the purpose of the module, any known limitations or required updates, and includes examples for how to run the analyses to the folder.
 
 An example highlighting a `new-analysis` directory is shown below.
 The directory is placed alongside existing analyses within the `analyses` directory.
@@ -152,7 +152,7 @@ The author has produced their output figures as `.pdf` files.
 We have a preference for vector graphics as PDF files, though other forms of vector graphics are also appropriate.
 The results folder contains a tabular summary as a comma separated values file.
  We expect that the file suffix (`.csv`, `.tsv`) accurately denotes the format of the added files.
-The author has also included a `README.md`.
+The author has also included a `README.md` ([see Documenting Your Analysis](#documenting-your-analysis)).
 
 ```
 OpenPBTA-analysis
@@ -175,6 +175,18 @@ OpenPBTA-analysis
 └── scratch
 ```
 
+### Documenting Your Analysis
+
+A goal of the OpenPBTA project is to create a collection of workflows that are commonly used for atlas papers.
+As such, documenting your analytical code via comments and including information summarizing the purpose of your analysis is important.
+
+When you file the first pull request creating a new analysis module, add your module to the [Modules At A Glance table](analyses#modules-at-a-glance).
+This table contains fields for the directory name, what input files are required, a short description, and any files that you expect other analyses will rely on.
+As your analysis develops and input or output files change, please check this table remains up to date. 
+This step is included in the pull request reproducibility checklist.
+
+When an analysis module contains multiple steps or is nearing completion, add a `README.md` file that summarizes the purpose of the module, any known limitations or required updates, and includes examples for how to run the analyses to the folder.
+
 ### Analysis Script Numbering
 
 As shown above, analysis scripts within a folder should be numbered from `01` and are intended be run in order.

diff --git a/analyses/README.md b/analyses/README.md
@@ -0,0 +1,35 @@
+## Analysis Modules
+
+This directory contains various analysis modules in the OpenPBTA project.
+See the README of an individual analysis modules for more information about that module.
+
+### Modules at a glance
+
+The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that are consumed by other analysis modules.
+This is in service of documenting interdependent analyses.
+Note that _nearly all_ modules use the harmonized clinical data file (`pbta-histologies.tsv`) even when it is not explicitly included in the table below.
+
+| Module | Input Files | Brief Description | Output Files Consumed by Other Analyses |
+|--------|-------|-------------------|--------------|
+| [`cnv-comparison`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/cnv-comparison) | Earlier version of SEG files | *Deprecated*; compared earlier version of the CNV methods. | N/A 
+| [`collapse-rnaseq`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/collapse-rnaseq) | `pbta-gene-expression-rsem-fpkm.polya.rds`, `pbta-gene-expression-rsem-fpkm.stranded.rds`, `gencode.v27.primary_assembly.annotation.gtf.gz` | Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds`, `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` (included in data download)
+| [`comparative-RNASeq-analysis`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/comparative-RNASeq-analysis) | `pbta-gene-expression-rsem-tpm.polya.rds`, `pbta-gene-expression-rsem-tpm.stranded.rds` | *In progress*; will produce expression outlier profiles per [#229](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/229) | N/A |
+| [`copy_number_consensus_call`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/copy_number_consensus_call) | `pbta-cnv-cnvkit.seg.gz`, `pbta-cnv-controlfreec.tsv.gz`, `pbta-sv-manta.tsv.gz` | *In progress*; will produce consensus copy number calls per [#128](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/128) | N/A
+| [`create-subset-files`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/create-subset-files) | All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration
+| [`focal-cn-file-preparation`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/focal-cn-file-preparation) | `pbta-cnv-cnvkit.seg.gz`, `pbta-cnv-controlfreec.tsv.gz` | Maps from copy number variant caller segments to gene identifiers; will eventually be updated to use consensus copy number calls ([#186](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/186))| `cnvkit_annotated_cn_autosomes.tsv.gz`, `cnvkit_annotated_cn_x_and_y.tsv.gz`, `controlfreec_annotated_cn_autosomes.tsv.gz`, `controlfreec_annotated_cn_x_and_y.tsv.gz`
+| [`fusion_filtering`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/analyses-readme/analyses/fusion_filtering) | `pbta-fusion-arriba.tsv.gz`, `pbta-fusion-starfusion.tsv.gz` | Standardizes, filters, and prioritizes fusion calls | `PutativeDriverFusion.tsv` which is distributed as `pbta-fusion-putative-oncogenic.tsv` in the data download 
+| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `independent-specimens.wgs.primary.tsv`, `independent-specimens.wgs.primary-plus.tsv`, `independent-specimens.wgswxs.primary.tsv`, `independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
+| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.*.tsv`, Current version uses `pbta-snv-lancet.vep.maf.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
+| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/ssgsea-hallmark/results/GeneSetExpressionMatrix.RDS`, `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds`, `analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz`, `pbta-snv-consensus-mutation-tmb.tsv` | *In progress*; summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244) | N/A
+| [`mutational-signatures`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutational-signatures) | `pbta-snv-consensus-mutation.maf.tsv.gz` | Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A 
+| [`mutect2-vs-strelka2`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutect2-vs-strelka2) | `pbta-snv-mutect2.vep.maf.gz`, `pbta-snv-strelka2.vep.maf.gz` | *Deprecated*; comparison of only two SNV callers, subsumed by `snv-callers` | N/A
+| [`oncoprint-landscape`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/oncoprint-landscape) | `pbta-snv-consensus-mutation.maf.tsv.gz`, `pbta-fusion-putative-oncogenic.tsv`, `analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz`, optionally `independent-specimens.*` | Combines mutation, copy number, and fusion data into an OncoPrint plot ([#6](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/6)); will need to be updated as all data types are refined | N/A
+| [`sample-distribution-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/sample-distribution-analysis) | `pbta-histologies.tsv` | Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | N/A
+| [`selection-strategy-comparison`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison) | `pbta-gene-expression-rsem-fpkm.polya.rds`, `pbta-gene-expression-rsem-fpkm.stranded.rds` | Comparison of RNA-seq data from different selection strategies | N/A 
+| [`sex-prediction-from-RNASeq`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/sex-prediction-from-RNASeq) | `pbta-gene-expression-kallisto.stranded.rds`, `pbta-histologies.tsv` | *In progress*; predicts genetic sex using RNA-seq data ([#84](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/6)) | N/A 
+| [`snv-callers`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/snv-callers) | `pbta-snv-lancet.vep.maf.gz`, `pbta-snv-mutect2.vep.maf.gz`, `pbta-snv-strelka2.vep.maf.gz`, `pbta-snv-vardict.vep.maf.gz` | Generates consensus SNV and indel calls; calculates tumor mutation burden using the consensus calls | `pbta-snv-consensus-mutation.maf.tsv.gz`, `pbta-snv-consensus-mutation-tmb.tsv` (included in data download) 
+| [`ssgsea-hallmark`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/ssgsea-hallmark) | Currently `pbta-gene-counts-rsem-expected_count.stranded.rds` | *Needs updating per [#235](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/235)*; performs single-sample Gene Set Enrichment Analysis using Hallmark gene sets | `GeneSetExpressionMatrix.RDS`
+| [`survival-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/survival-analysis) | TBD | *In progress*; will eventually contain functions for various types of survival analysis ([#18](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/18)) | N/A
+| [`sv-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/sv-analysis) | `pbta-sv-manta.tsv.gz`, `independent-specimens.wgs.primary-plus.tsv` | *In progress*; chromothripsis analysis per [#27](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/27)| N/A
+| [`tmb-compare-tcga`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/tmb-compare-tcga) | `pbta-snv-consensus-mutation-tmb.tsv` | Compares PBTA tumor mutation burden to adult TCGA data; may need to be updated per [#257](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/257) | N/A
+| [`transcriptomic-dimension-reduction`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/transcriptomic-dimension-reduction)| `pbta-gene-expression-rsem-fpkm.polya.rds`, `pbta-gene-expression-rsem-fpkm.stranded.rds`, `pbta-gene-expression-kallisto.polya.rds`, `pbta-gene-expression-kallisto.stranded.rds` | Dimension reduction and visualization of RNA-seq data (part of [#9](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/9)) | N/A