Skip to content

Latest commit

 

History

History
60 lines (56 loc) · 23.3 KB

README.md

File metadata and controls

60 lines (56 loc) · 23.3 KB

Analysis Modules

This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.

Modules at a glance

The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules. In addition, this table reflects which analyses are included in the OpenPBTA manuscript. This is in service of documenting interdependent analyses. Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv) even when it is not explicitly included in the table below.

Module Input Files Brief Description Output Files Consumed by Other Analyses Analysis included in manuscript?
chromosomal-instability pbta-histologies.tsv
pbta-sv-manta.tsv.gz
pbta-cnv-cnvkit.seg.gz
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
Yes
chromothripsis pbta-sv-manta.tsv.gz
pbta-cnv-consensus.seg.gz
independent-specimens.wgs.primary-plus.tsv
figures/palettes/histology_label_color_table.tsv
analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
This module runs ShatterSeek, identifies chromothripsis regions, and visualizes the results. N/A Yes
cnv-chrom-plot pbta-cnv-consensus-gistic.zip
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg
Plots genome wide visualizations relating to copy number results N/A Yes
cnv-comparison Earlier version of SEG files Deprecated; compared earlier version of the CNV methods. N/A No
collapse-rnaseq pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
gencode.v27.primary_assembly.annotation.gtf.gz
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub)
results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub)
Yes
comparative-RNASeq-analysis pbta-gene-expression-rsem-tpm.polya.rds
pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-manifest.tsv
pbta-mend-qc-results.tar.gz
In progress; will produce expression outlier profiles per #229 N/A No
compare-gistic analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 N/A No
copy_number_consensus_call pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-sv-manta.tsv.gz
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made results/cnv_consensus.tsv
results/pbta-cnv-consensus.seg.gz (included in data download)
ref/cnv_excluded_regions.bed
ref/cnv_callable.bed
Yes
create-subset-files All files This module contains the code to create the subset files used in continuous integration All subset files for continuous integration Not directly
focal-cn-file-preparation pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz
Maps from copy number variant caller segments to "most focal unit" results/cnvkit_annotated_cn_autosomes.tsv.gz
results/cnvkit_annotated_cn_x_and_y.tsv.gz
results/controlfreec_annotated_cn_autosomes.tsv.gz
results/controlfreec_annotated_cn_x_and_y.tsv.gz
results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download)
results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download)
Yes
fusion_filtering pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Standardizes, filters, and prioritizes fusion calls results/pbta-fusion-putative-oncogenic.tsv(included in data download)
results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download)
results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download)
Yes
fusion-summary pbta-histologies.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Generate summary tables from fusion files (#398; #623) results/fusion_summary_embryonal_foi.tsv (included in data download)
results/fusion_summary_ependymoma_foi.tsv (included in data download)
results/fusion_summary_ewings_foi.tsv (included in data download)
Yes
gene-set-enrichment-analysis analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
In progress. Updated gene set enrichment analysis with appropriate RNA-seq expression data results/gsva_scores_stranded.tsv
results/gsva_scores_polya.tsv
for stranded, polya expression data respectively
Yes
hotspot-detection pbta-snv-strelka2.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
pbta-snv-lancet.vep.maf.gz
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. pbta-snv-hotspots-mutation.maf.tsv.gz Yes
immune-deconv pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Immune/Stroma characterization across PBTA (part of #15) results/quantiseq_deconv-output.rds Yes
independent-samples pbta-histologies.tsv Generates independent specimen lists for WGS/WXS samples results/independent-specimens.wgs.primary.tsv (included in data download)
results/independent-specimens.wgs.primary-plus.tsv (included in data download)
results/independent-specimens.wgswxs.primary.tsv (included in data download)
results/independent-specimens.wgswxs.primary-plus.tsv (included in data download)
Yes
interaction-plots independent-specimens.wgs.primary-plus.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) N/A Yes
molecular-subtyping-ATRT analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-snv-consensus-mutation-tmb-all.tsv
pbta-cnv-consensus-gistic.zip
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work N/A No
molecular-subtyping-CRANIO pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
Molecular subtyping of craniopharyngiomas samples #810 results/CRANIO_molecular_subtype.tsv Yes
molecular-subtyping-EPN pbta-histologies-base.tsv
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-cnv-consensus-gistic.zip
analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv
fusion_summary_ependymoma_foi.tsv
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
Molecular subtyping of ependymoma tumors results/EPN_all_data_withsubgroup.tsv Yes
molecular-subtyping-EWS pbta-histologies-base.tsv
fusion_summary_ewings_foi.tsv
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 results/EWS_samples.tsv Yes
molecular-subtyping-HGG pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-fusion-putative-oncogenic.tsv
pbta-cnv-consensus-gistic.zip
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of high-grade glioma samples #249 results/HGG_molecular_subtype.tsv Yes
molecular-subtyping-LGAT pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
Molecular subtyping of Low-grade astrocytic tumor samples #631 results/lgat_subtyping.tsv Yes
molecular-subtyping-MB pbta-histologies-base.tsv
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Molecular classification of Medulloblastoma subtypes (part of #731) results/MB_molecular_subtype.tsv
results/MB_batchcorrected_molecular_subtype.tsv
for uncorrected and batch-corrected input matrix
Yes
molecular-subtyping-SHH-tp53 pbta-histologies
pbta-snv-consensus-mutation.maf.tsv.gz
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 N/A No
molecular-subtyping-chordoma consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
In progress; identifying poorly-differentiated chordoma samples per #250 N/A Yes
molecular-subtyping-embryonal pbta-histologies-base.tsv
fusion_summary_embryonal_foi.tsv
pbta-sv-manta.tsv.gz
consensus_seg_annotated_cn_x_and_y.tsv.gz

pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 results/embryonal_tumor_molecular_subtypes.tsv Yes
molecular-subtyping-integrate pbta-histologies-base.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
Add molecular subtype information to base histology results/pbta-histologies.tsv Yes
molecular-subtyping-neurocytoma pbta-histologies-base.tsv Molecular subtyping of Neurocytoma samples #805 results/neurocytoma_subtyping.tsv Yes
molecular-subtyping-pathology analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv
analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv
analyses/molecular-subtyping-EWS/results/EWS_samples.tsv
analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv
analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv
analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv
analyses/molecular-subtyping-chordoma/results/chordoma_smarcb1_status.tsv
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 results/compiled_molecular_subtyping_with_clinical_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv
Yes
mutational-signatures pbta-snv-consensus-mutation.maf.tsv.gz Performs three separate analyses of mutational signatures: 1) Analyzes COSMIC and Alexandrov et al. mutational signatures using the consensus SNV data; 2) Performs de novo signature extraction using only the WGS samples from the consensus SNV data; 3) Fits known CNS signatures to the WGS samples from the consensus SNV data N/A Yes
mutect2-vs-strelka2 pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
Deprecated; comparison of only two SNV callers, subsumed by snv-callers N/A No
oncoprint-landscape pbta-snv-consensus-mutation.maf.tsv.gz
pbta-fusion-putative-oncogenic.tsv
consensus_seg_annotated_cn_autosomes.tsv.gz
consensus_seg_annotated_cn_x_and_y.tsv.gz
independent-specimens.*
Combines mutation, copy number, and fusion data into an OncoPrint plot N/A Yes
rna-seq-composition pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-results.tar.gz
pbta-mend-qc-manifest.tsv
pbta-star-log-manifest.tsv
pbta-star-log-final.tar.gz
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition N/A No
run-gistic pbta-histologies.tsv
pbta-cnv-consensus.seg.gz
Runs GISTIC 2.0 on SEG files pbta-cnv-consensus-gistic.zip (included in data download) Yes
sample-distribution-analysis pbta-histologies.tsv Produces plots and tables that illustrate the distribution of different histologies in the PBTA data N/A No
selection-strategy-comparison pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
Deprecated; Comparison of RNA-seq data from different selection strategies N/A No
sex-prediction-from-RNASeq pbta-gene-expression-kallisto.stranded.rds
pbta-histologies.tsv
Predicts genetic sex using RNA-seq data (#84) N/A No
snv-callers pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub)
results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv
results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv(included in data download; too large for tracking via GitHub)
results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz
results/consensus/tcga-snv-mutation-tmb.tsv
results/consensus/tcga-snv-mutation-tmb-coding.tsv
Yes
ssgsea-hallmark pbta-gene-counts-rsem-expected_count.stranded.rds Deprecated; performs GSVA using Hallmark gene sets N/A No, subsumed by gene-set-enrichment-analysis
survival-analysis pbta-histologies.tsv
independent-specimens.wgswxs.primary.tsv
tp53_altered_status.tsv (results from tp53_nf1_score module)
quantiseq_deconv-output.rds (results from immune-deconv module)
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Performs kaplan-meier, log rank, and/or cox regression univariate or multivariate survival modeling N/A Yes
telomerase-activity-prediction pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
Quantify telomerase activity across pediatric brain tumors (part of #148) results/TelomeraseScores_PTBAPolya_counts
results/TelomeraseScores_PTBAPolya_FPKM.txt
results/TelomeraseScores_PTBAStranded_counts.txt
results/TelomeraseScores_PTBAStranded_FPKM.txt
results/EXTENDScores_{broad_histology}.tsv
Yes
tmb-compare pbta-snv-consensus-mutation-tmb-coding.tsv Deprecated. Compares PBTA tumor mutation burden to adult TCGA data. N/A Not directly, similar figure generated in figures/
tp53_nf1_score pbta-snv-consensus-mutation.maf.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 N/A Yes
transcriptomic-dimension-reduction pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
Dimension reduction and visualization of RNA-seq data (part of #9) N/A Yes
tcga-capture-kit-investigation pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
pbta-histologies.tsv
pbta-tcga-manifest.tsv
WGS.hg38.lancet.unpadded.bed
WGS.hg38.strelka2.unpadded.bed
WGS.hg38.mutect2.vardict.unpadded.bed
Deprecated; Investigation of the TMB discrepancy between PBTA and TCGA data results/*.bed No