Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
Reconfigure how SNV-callers uses BED files for TMB calculations (#671)
Browse files Browse the repository at this point in the history
* Copies of old stuff that I can take pieces from

* BED ranges set up

* It's mostly working!

* Few tinier edits before testing

* Change back to normal files

* Get rid of old scripts.

* Development remnant

* Run styler on it all!

* Change back the indexing change that apparently HGG was dependent on

* Retire old analyses from CircleCI

* Add back TCGA processing

* Incorporate @jashapiro doc and tidy suggestions

* Add TODOs to things we won't do just yet.

* genome_size -> region_size

* Write down 256GB to figures README

* Drop Panel samples in calculate TMB

* Use a better fix for the Panel problem

* Refresh plot
  • Loading branch information
cansavvy authored Apr 15, 2020
1 parent 7d4b1c6 commit 66bb67a
Show file tree
Hide file tree
Showing 12 changed files with 273 additions and 328 deletions.
42 changes: 21 additions & 21 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -192,35 +192,35 @@ jobs:
name: RNA-Seq composition
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/rna-seq-composition/rna-seq-composition.Rmd', clean = TRUE)"

# - run:
# name: TCGA SNV Caller Analysis
# command: ./scripts/run_in_ci.sh bash analyses/snv-callers/run_caller_consensus_analysis-tcga.sh
- run:
name: TCGA SNV Caller Analysis
command: ./scripts/run_in_ci.sh bash analyses/snv-callers/run_caller_consensus_analysis-tcga.sh

- run:
name: SNV Caller Analysis
command: OPENPBTA_VAF_CUTOFF=0.5 ./scripts/run_in_ci.sh bash analyses/snv-callers/run_caller_consensus_analysis-pbta.sh

# - run:
# name: Tumor mutation burden with TCGA
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/tmb-compare-tcga/compare-tmb.Rmd', clean = TRUE)"

# - run:
# name: PBTA vs TCGA explore
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/explore-tcga-pbta.Rmd', clean = TRUE)"
- run:
name: Tumor mutation burden with TCGA
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/tmb-compare-tcga/compare-tmb.Rmd', clean = TRUE)"

# This analysis was used to explore the TCGA PBTA data when the BED files used to calculate TCGA
# were incorrect https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/568
#- run:
# name: PBTA vs TCGA explore
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/explore-tcga-pbta.Rmd', clean = TRUE)"

# Retired analysis
# - run:
# name: Lancet WXS vs WGS test
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/lancet-paired-WXS-WGS.Rmd', clean = TRUE)"
# This analysis arose from 'PBTA vs TCGA explore' and was used to explore Lancet's ability to handle WXS data #- run:
# name: Lancet WXS vs WGS test
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/lancet-paired-WXS-WGS.Rmd', clean = TRUE)"

# Retired analysis
# - run:
# name: Lancet padded vs unpadded test
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/lancet-padded-vs-unpadded.Rmd', clean = TRUE)"

# This analysis arose from PBTA vs TCGA explore' and was used to explore Lancet's results with padded vs unpadded
#- run:
# name: Lancet padded vs unpadded test
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/lancet-wxs-tests/lancet-padded-vs-unpadded.Rmd', clean = TRUE)"

# This analysis was a side concept question and no longer needs to be run.
# - run:
# This analysis was a side concept question and no longer needs to be run.
# - run:
# name: SNV Caller VAF Cutoff Experiment
# command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/vaf_cutoff_experiment.Rmd', clean = TRUE)"

Expand Down
42 changes: 3 additions & 39 deletions analyses/snv-callers/run_caller_consensus_analysis-pbta.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,7 @@ consensus_file=analyses/snv-callers/results/consensus/pbta-snv-consensus-mutatio

# BED and GTF file paths
cds_file=scratch/gencode.v27.primary_assembly.annotation.bed
all_mut_wgs_bed=scratch/intersect_strelka_mutect_WGS.bed
all_mut_wxs_bed=data/WXS.hg38.100bp_padded.bed
coding_wgs_bed=scratch/intersect_cds_strelka_mutect_WGS.bed
coding_wxs_bed=scratch/intersect_cds_strelka_mutect_WXS.bed
wgs_bed=scratch/intersect_strelka_mutect_WGS.bed

# Set a default for the VAF filter if none is specified
vaf_cutoff=${OPENPBTA_VAF_CUTOFF:-0}
Expand Down Expand Up @@ -54,7 +51,7 @@ python3 analyses/snv-callers/scripts/01-setup_db.py \
bedtools intersect \
-a data/WGS.hg38.strelka2.unpadded.bed \
-b data/WGS.hg38.mutect2.vardict.unpadded.bed \
> $all_mut_wgs_bed
> $wgs_bed

#################### Make coding regions file
# Convert GTF to BED file for use in bedtools
Expand All @@ -65,46 +62,13 @@ gunzip -c data/gencode.v27.primary_assembly.annotation.gtf.gz \
| sort -k 1,1 -k 2,2n \
| bedtools merge \
> $cds_file

##################### Make WGS coding BED file
# Make WGS coding BED file for strelka
bedtools intersect \
-a data/WGS.hg38.strelka2.unpadded.bed \
-b $cds_file \
> scratch/wgs_coding_strelka.bed

# Make WGS coding BED file for mutect
bedtools intersect \
-a data/WGS.hg38.mutect2.vardict.unpadded.bed \
-b $cds_file \
> scratch/wgs_coding_mutect.bed

# Intersect the mutect and strelka coding beds into one
bedtools intersect \
-a scratch/wgs_coding_mutect.bed \
-b scratch/wgs_coding_strelka.bed \
| sort -k 1,1 -k 2,2n \
| bedtools merge \
> $coding_wgs_bed

##################### Make WXS coding BED file
# Intersect coding and WXS ranges, sort and merge
bedtools intersect \
-a data/WXS.hg38.100bp_padded.bed \
-b $cds_file \
| sort -k 1,1 -k 2,2n \
| bedtools merge \
> $coding_wxs_bed

######################### Calculate consensus TMB ##############################
Rscript analyses/snv-callers/scripts/03-calculate_tmb.R \
--db_file $dbfile \
--output analyses/snv-callers/results/consensus \
--metadata data/pbta-histologies.tsv \
--all_bed_wgs $all_mut_wgs_bed \
--all_bed_wxs $all_mut_wxs_bed \
--coding_bed_wgs $coding_wgs_bed \
--coding_bed_wxs $coding_wxs_bed \
--coding_regions $cds_file \
--overwrite

########################## Compress consensus file #############################
Expand Down
21 changes: 1 addition & 20 deletions analyses/snv-callers/run_caller_consensus_analysis-tcga.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,6 @@ consensus_file=analyses/snv-callers/results/consensus/tcga-snv-consensus-snv.maf
# BED and GTF file paths
cds_file=scratch/gencode.v27.primary_assembly.annotation.bed

# The WGS files are really just place holders and aren't used since the TCGA data is all WGS
all_mut_wgs_bed=analyses/snv-callers/ref_files/gencode.v19.basic.exome.hg38liftover.bed
coding_wgs_bed=analyses/snv-callers/ref_files/gencode.v19.basic.exome.hg38liftover.bed

all_mut_wxs_bed=analyses/snv-callers/ref_files/gencode.v19.basic.exome.hg38liftover.bed
coding_wxs_bed=scratch/intersect_cds_gencode_liftover_WXS.bed

# Set a default for the VAF filter if none is specified
vaf_cutoff=${OPENPBTA_VAF_CUTOFF:-0}

Expand Down Expand Up @@ -61,24 +54,12 @@ gunzip -c data/gencode.v27.primary_assembly.annotation.gtf.gz \
| bedtools merge \
> $cds_file

# Make WXS coding BED file
# TODO: Update this BED when we get the target BEDs updated in v15/v16
bedtools intersect \
-a analyses/snv-callers/ref_files/gencode.v19.basic.exome.hg38liftover.bed \
-b $cds_file \
| sort -k 1,1 -k 2,2n \
| bedtools merge \
> $coding_wxs_bed

######################### Calculate consensus TMB ##############################
Rscript analyses/snv-callers/scripts/03-calculate_tmb.R \
--db_file $dbfile \
--output analyses/snv-callers/results/consensus \
--metadata data/pbta-tcga-manifest.tsv \
--all_bed_wgs $all_mut_wgs_bed \
--all_bed_wxs $all_mut_wxs_bed \
--coding_bed_wgs $coding_wgs_bed \
--coding_bed_wxs $coding_wxs_bed \
--coding_regions $cds_file \
--overwrite \
--tcga

Expand Down
1 change: 0 additions & 1 deletion analyses/snv-callers/scripts/02-merge_callers.R
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,3 @@ consensus_df %>%
as.data.frame() %>%
# Write to a TSV file, change NAs back to "."
readr::write_tsv(opt$output_file, na = ".")

Loading

0 comments on commit 66bb67a

Please sign in to comment.