Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update focal-cn-preparation to use consensus SEG file in data download #1130

Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 117 additions & 117 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,157 +12,157 @@ jobs:
name: Data Download
command: OPENPBTA_URL=https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/data OPENPBTA_RELEASE=testing ./scripts/run_in_ci.sh bash download-data.sh

- run:
name: List Data Directory Contents
command: ./scripts/run_in_ci.sh ls data/testing
# - run:
# name: List Data Directory Contents
# command: ./scripts/run_in_ci.sh ls data/testing

- run:
name: Check python packages
command: ./scripts/run_in_ci.sh bash scripts/check-python.sh
# - run:
# name: Check python packages
# command: ./scripts/run_in_ci.sh bash scripts/check-python.sh

- run:
name: High level histology grouping for plot labels
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)"
# - run:
# name: High level histology grouping for plot labels
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)"

- run:
name: Sample Distribution Analyses
command: ./scripts/run_in_ci.sh bash "analyses/sample-distribution-analysis/run-sample-distribution.sh"
# - run:
# name: Sample Distribution Analyses
# command: ./scripts/run_in_ci.sh bash "analyses/sample-distribution-analysis/run-sample-distribution.sh"

# - run:
# name: Sample Distribution Figure
# command: ./scripts/run_in_ci.sh Rscript figures/scripts/fig1-sample-distribution.R

# - run:
# name: Transcriptome dimensionality reduction
# command: ./scripts/run_in_ci.sh ./analyses/transcriptomic-dimension-reduction/ci-dimension-reduction-plots.sh

# # The analysis no longer needs to be tested as it has been retired and is better covered by 'SNV Caller Analysis' below.
# #- run:
# # name: Mutect2 vs Strelka2
# # command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/mutect2-vs-strelka2/01-set-up.Rmd', clean = TRUE);
# # rmarkdown::render('analyses/mutect2-vs-strelka2/02-analyze-concordance.Rmd', clean = TRUE)"

- run:
name: Sample Distribution Figure
command: ./scripts/run_in_ci.sh Rscript figures/scripts/fig1-sample-distribution.R
# ### MOLECULAR SUBTYPING ###

- run:
name: Transcriptome dimensionality reduction
command: ./scripts/run_in_ci.sh ./analyses/transcriptomic-dimension-reduction/ci-dimension-reduction-plots.sh
# - run:
# name: Molecular Subtyping - HGG
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh

# The analysis no longer needs to be tested as it has been retired and is better covered by 'SNV Caller Analysis' below.
#- run:
# name: Mutect2 vs Strelka2
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/mutect2-vs-strelka2/01-set-up.Rmd', clean = TRUE);
# rmarkdown::render('analyses/mutect2-vs-strelka2/02-analyze-concordance.Rmd', clean = TRUE)"
# - run:
# name: Molecular subtyping - Non-MB/Non-ATRT Embryonal tumors
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-embryonal/run-embryonal-subtyping.sh

### MOLECULAR SUBTYPING ###
# - run:
# name: Molecular Subtyping and Plotting - ATRT
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh

- run:
name: Molecular Subtyping - HGG
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh
# - run:
# name: Molecular subtyping Chordoma
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-chordoma/run-molecular-subtyping-chordoma.sh

- run:
name: Molecular subtyping - Non-MB/Non-ATRT Embryonal tumors
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-embryonal/run-embryonal-subtyping.sh
# - run:
# name: Molecular subtyping - Ependymoma
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EPN/run-molecular-subtyping-EPN.sh

- run:
name: Molecular Subtyping and Plotting - ATRT
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh
# - run:
# name: Molecular Subtyping - LGAT
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-LGAT/run_subtyping.sh

- run:
name: Molecular subtyping Chordoma
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-chordoma/run-molecular-subtyping-chordoma.sh
# - run:
# name: Molecular Subtyping - EWS
# command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EWS/run_subtyping.sh

- run:
name: Molecular subtyping - Ependymoma
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EPN/run-molecular-subtyping-EPN.sh
# - run:
# name: Molecular Subtyping Neurocytoma
# command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-neurocytoma/run_subtyping.sh

- run:
name: Molecular Subtyping - LGAT
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-LGAT/run_subtyping.sh
# # Commenting this out for now; the code is expected to change
# # - run:
# # name: Molecular Subtyping - Compile and incorporate pathology feedback
# # command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-pathology/run-subtyping-aggregation.sh

- run:
name: Molecular Subtyping - EWS
command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EWS/run_subtyping.sh
# - run:
# name: Molecular Subtyping - MB
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-MB/run-molecular-subtyping-mb.sh

- run:
name: Molecular Subtyping Neurocytoma
command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-neurocytoma/run_subtyping.sh
# - run:
# name: Molecular Subtyping - CRANIO
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-CRANIO/run-molecular-subtyping-cranio.sh

# Commenting this out for now; the code is expected to change
# - run:
# name: Molecular Subtyping - Compile and incorporate pathology feedback
# command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-pathology/run-subtyping-aggregation.sh
# - run:
# name: Molecular Subtyping - INTEGRATE to BASE histology
# command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-integrate/run-subtyping-integrate.sh

- run:
name: Molecular Subtyping - MB
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-MB/run-molecular-subtyping-mb.sh
# # Deprecated - these results do not include germline calls and therefore are insufficient by subtyping
# # - run:
# # name: SHH TP53 Molecular Subtyping
# # command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-SHH-tp53/SHH-tp53-molecular-subtyping-data-prep.Rmd', clean = TRUE)"

- run:
name: Molecular Subtyping - CRANIO
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-CRANIO/run-molecular-subtyping-cranio.sh
# ### END MOLECULAR SUBTYPING ###

- run:
name: Molecular Subtyping - INTEGRATE to BASE histology
command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-integrate/run-subtyping-integrate.sh
# - run:
# name: Collapse RSEM
# command: ./scripts/run_in_ci.sh bash analyses/collapse-rnaseq/run-collapse-rnaseq.sh

# Deprecated - these results do not include germline calls and therefore are insufficient by subtyping
# - run:
# name: SHH TP53 Molecular Subtyping
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/molecular-subtyping-SHH-tp53/SHH-tp53-molecular-subtyping-data-prep.Rmd', clean = TRUE)"
# - run:
# name: Fusion Summary
# command: ./scripts/run_in_ci.sh bash "analyses/fusion-summary/run-new-analysis.sh"

### END MOLECULAR SUBTYPING ###
# - run:
# name: Immune deconvolution using immunedeconv, uses xCell by default
# command: ./scripts/run_in_ci.sh bash analyses/immune-deconv/run-immune-deconv.sh

- run:
name: Collapse RSEM
command: ./scripts/run_in_ci.sh bash analyses/collapse-rnaseq/run-collapse-rnaseq.sh
# - run:
# name: Fusion standardization and annotation for STARfusion and Arriba with polya and stranded data and creates recurrent fusion list
# command: ./scripts/run_in_ci.sh bash "analyses/fusion_filtering/run_fusion_merged.sh"

- run:
name: Fusion Summary
command: ./scripts/run_in_ci.sh bash "analyses/fusion-summary/run-new-analysis.sh"
# - run:
# name: Fusion standardization and annotation for STARFusio and Arriba for base subtyping
# command: OPENPBTA_BASE_SUBTYPING=1 ./scripts/run_in_ci.sh bash "analyses/fusion_filtering/run_fusion_merged.sh"

- run:
name: Immune deconvolution using immunedeconv, uses xCell by default
command: ./scripts/run_in_ci.sh bash analyses/immune-deconv/run-immune-deconv.sh

- run:
name: Fusion standardization and annotation for STARfusion and Arriba with polya and stranded data and creates recurrent fusion list
command: ./scripts/run_in_ci.sh bash "analyses/fusion_filtering/run_fusion_merged.sh"

- run:
name: Fusion standardization and annotation for STARFusio and Arriba for base subtyping
command: OPENPBTA_BASE_SUBTYPING=1 ./scripts/run_in_ci.sh bash "analyses/fusion_filtering/run_fusion_merged.sh"

- run:
name: Sex prediction from RNA-seq - Clean data-train elasticnet-evaluate model
command: OPENPBTA_PERCENT=0 ./scripts/run_in_ci.sh bash analyses/sex-prediction-from-RNASeq/run-sex-prediction-from-RNASeq.sh
# - run:
# name: Sex prediction from RNA-seq - Clean data-train elasticnet-evaluate model
# command: OPENPBTA_PERCENT=0 ./scripts/run_in_ci.sh bash analyses/sex-prediction-from-RNASeq/run-sex-prediction-from-RNASeq.sh

# Deprecated: this comparison is no longer needed after separating Poly-A and stranded.
# - run:
# name: Selection Strategy Comparison
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/selection-strategy-comparison/01-selection-strategies.rmd', params = list(neighbors = 2), clean = TRUE)"
# # Deprecated: this comparison is no longer needed after separating Poly-A and stranded.
# # - run:
# # name: Selection Strategy Comparison
# # command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/selection-strategy-comparison/01-selection-strategies.rmd', params = list(neighbors = 2), clean = TRUE)"

- run:
name: TP53 NF1 classifier run
command: OPENPBTA_POLYAPLOT=0 ./scripts/run_in_ci.sh bash "analyses/tp53_nf1_score/run_classifier.sh"
# - run:
# name: TP53 NF1 classifier run
# command: OPENPBTA_POLYAPLOT=0 ./scripts/run_in_ci.sh bash "analyses/tp53_nf1_score/run_classifier.sh"

# This is deprecated
# - run:
# name: ssGSEA Analysis
# command: OPENPBTA_ANOVAPVALUE=0.25 OPENPBTA_TUKEYPVALUE=0.50 OPENPBTA_PERCKEEP=0.50 ./scripts/run_in_ci.sh bash analyses/ssgsea-hallmark/run-ssgsea-hallmark.sh
# # This is deprecated
# # - run:
# # name: ssGSEA Analysis
# # command: OPENPBTA_ANOVAPVALUE=0.25 OPENPBTA_TUKEYPVALUE=0.50 OPENPBTA_PERCKEEP=0.50 ./scripts/run_in_ci.sh bash analyses/ssgsea-hallmark/run-ssgsea-hallmark.sh


# The second method - ControlFREEC - was not included as of v6, so the comparison can no longer be performed
# - run:
# name: CNV Caller Comparison
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/cnv-comparison/01-cnv-comparison-plotting.Rmd', clean = TRUE)"
# # The second method - ControlFREEC - was not included as of v6, so the comparison can no longer be performed
# # - run:
# # name: CNV Caller Comparison
# # command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/cnv-comparison/01-cnv-comparison-plotting.Rmd', clean = TRUE)"

- run:
name: Independent samples
command: ./scripts/run_in_ci.sh bash analyses/independent-samples/run-independent-samples.sh
# - run:
# name: Independent samples
# command: ./scripts/run_in_ci.sh bash analyses/independent-samples/run-independent-samples.sh

- run:
name: Independent sample for base subtyping
command: OPENPBTA_BASE_SUBTYPING=1 ./scripts/run_in_ci.sh bash analyses/independent-samples/run-independent-samples.sh
# - run:
# name: Independent sample for base subtyping
# command: OPENPBTA_BASE_SUBTYPING=1 ./scripts/run_in_ci.sh bash analyses/independent-samples/run-independent-samples.sh

- run:
name: Interaction plot
command: OPENPBTA_ALL=0 ./scripts/run_in_ci.sh bash analyses/interaction-plots/01-create-interaction-plots.sh
# - run:
# name: Interaction plot
# command: OPENPBTA_ALL=0 ./scripts/run_in_ci.sh bash analyses/interaction-plots/01-create-interaction-plots.sh

- run:
name: Mutational Signatures
command: OPENPBTA_QUICK_MUTSIGS=1 ./scripts/run_in_ci.sh bash analyses/mutational-signatures/run_mutational_signatures.sh
# - run:
# name: Mutational Signatures
# command: OPENPBTA_QUICK_MUTSIGS=1 ./scripts/run_in_ci.sh bash analyses/mutational-signatures/run_mutational_signatures.sh

# - run:
# name: Chromosomal instability breakpoints
# command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash analyses/chromosomal-instability/run_breakpoint_analysis.sh
# # - run:
# # name: Chromosomal instability breakpoints
# # command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash analyses/chromosomal-instability/run_breakpoint_analysis.sh

- run:
name: Copy number consensus
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,7 @@ if(!dir.exists(output_dir)) {
```

```{r}
# TODO: the consensus SEG file is not currently in the data download -- when it
# gets included we will have to change the file path here
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results",
consensus_seg_file <- file.path("..", "..", "data",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jharenza and @kgaonkar6 - is there any reason (that you are aware of) that prohibits me from making this change at this point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so - the latest file is in the download.

Copy link
Member Author

@jaclyn-taroni jaclyn-taroni Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to wait until all the v20 CNV changes go through though? Or would just getting #1123 through -> updating the release (if not done yet) be sufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated v20 release data with the latest consensus seg with #1123 b9284650be04df3538e6c6dba29b8eb0 pbta-cnv-consensus.seg.gz

I added the relative path so that we can run the code while running all the preprocessing steps of subtyping, so maybe we can add a logic to change to relative path or not within the if (params$base_run ==0)

```{r}
# TODO: the consensus SEG file is not currently in the data download -- when it
# gets included we will have to change the file path here
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results",
"pbta-cnv-consensus.seg.gz")
if ( params$base_run ==0 ){
histologies_file <- file.path("..", "..", "data", "pbta-histologies.tsv")
} else {
histologies_file <- file.path("..", "..", "data", "pbta-histologies-base.tsv")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenario would you use the file in analyses/copy_number_consensus_call/results? When you use pbta-histologies-base.tsv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we will rerun the consensus seg file module first so the latest consensus seg file can be used as input for the focal-cn module

"pbta-cnv-consensus.seg.gz")
histologies_file <- file.path("..", "..", "data", "pbta-histologies.tsv")

Expand Down