This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 67
Add cytoband to copy number files using bedtools intersect #617
Merged
jaclyn-taroni
merged 31 commits into
AlexsLemonade:master
from
cbethell:add-cytoband-status-with-bedtools
Mar 23, 2020
Merged
Changes from 3 commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
7dffdfc
Add cytoband data using bedtools intersect with UCSC cytoband file
cbethell e745ad9
Update comments
cbethell 1c4eea5
@cansavvy and @jashapiro suggested changes
cbethell bba0864
sort before filtering out losses and gains
cbethell f573daa
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 337a264
Add notebook to join and wrangle the cytoband bed files
cbethell 3c827b1
Add chromosome arm field and GISTIC arm status data
cbethell ee4ff57
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell bf47620
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 721ab75
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 9fa3b85
implement @jashapiro's suggested changes
cbethell d215dcb
add steps for loss and gain bed files to `run-bedtools.sh`
cbethell 5a2239d
Propagate changes to bed files to `03` nb
cbethell c31745d
change logic to uncompress the cytoband file once
cbethell 05bd93a
Substitute snakemake for shell in bedtools script
jashapiro b60cf95
Merge remote-tracking branch 'cbethell/add-cytoband-status-with-bedto…
jashapiro 2af767a
rerun `03` nb with updated coverage bed files
cbethell 60147fc
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 69a3ee5
Update module README and start defining most focal units
cbethell 8247e53
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 0c89687
Reformat final output table and rename output file
cbethell bcbbb09
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell 2b6c1b2
Apply suggestions from @jashapiro code review
cbethell 01f385b
Move addition of `chromosome_arm` step after joining original UCSC data
cbethell d64159b
update README to reflect addition of `band_length` column
cbethell 807a7ad
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell b26cd42
Merge branch 'master' into add-cytoband-status-with-bedtools
cbethell e9bf7a7
remove redundant joining of original ucsc cytoband data section
cbethell 126ea71
update usage comment and re-render html output
cbethell 53b906d
@jashapiro's commit suggestion to include sex chromosome data
cbethell 1b761ee
rerun module (now that UCSC file download includes sex chromosomes)
cbethell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
49 changes: 43 additions & 6 deletions
49
analyses/focal-cn-file-preparation/02-add-ploidy-consensus.nb.html
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -18,74 +18,105 @@ cd "$script_directory" || exit | |||||||||||||||||
|
||||||||||||||||||
scratch_dir=../../scratch | ||||||||||||||||||
data_dir=../../data | ||||||||||||||||||
results_dir=../../analyses/focal-cn-file-preparation/results | ||||||||||||||||||
histologies_file=${data_dir}/pbta-histologies.tsv | ||||||||||||||||||
gtf_file=${data_dir}/gencode.v27.primary_assembly.annotation.gtf.gz | ||||||||||||||||||
goi_file=../../analyses/oncoprint-landscape/driver-lists/brain-goi-list-long.txt | ||||||||||||||||||
independent_specimens_file=${data_dir}/independent-specimens.wgswxs.primary.tsv | ||||||||||||||||||
ucsc_bed_file=${results_dir}/ucsc_cytoband.bed | ||||||||||||||||||
consensus_bed_file=${scratch_dir}/consensus_seg_with_status.tsv | ||||||||||||||||||
loss_intersect_with_cytoband_file=${scratch_dir}/intersect_with_cytoband_losses.tsv | ||||||||||||||||||
gain_intersect_with_cytoband_file=${scratch_dir}/intersect_with_cytoband_gains.tsv | ||||||||||||||||||
callable_intersect_with_cytoband_file=${scratch_dir}/intersect_with_cytoband_callable.tsv | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think these file names where changed elsewhere, so I am noting that here.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
# Prep the consensus SEG file data | ||||||||||||||||||
Rscript --vanilla -e "rmarkdown::render('02-add-ploidy-consensus.Rmd', clean = TRUE)" | ||||||||||||||||||
|
||||||||||||||||||
# Run annotation step for consensus file | ||||||||||||||||||
Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
--cnv_file ${scratch_dir}/consensus_seg_with_status.tsv \ | ||||||||||||||||||
--gtf_file $gtf_file \ | ||||||||||||||||||
--metadata $histologies_file \ | ||||||||||||||||||
--filename_lead "consensus_seg_annotated_cn" \ | ||||||||||||||||||
--seg | ||||||||||||||||||
# Download and save UCSC cytoband file as bed file | ||||||||||||||||||
wget -O ${scratch_dir}/ucsc_cytoband.bed http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/cytoBand.txt.gz | ||||||||||||||||||
|
||||||||||||||||||
# Use bedtools intersect to find the intersection between the UCSC file with | ||||||||||||||||||
# cytoband data and the `scratch/consensus_with_status.tsv` file prepared in | ||||||||||||||||||
# `02-add-ploidy-consensus.Rmd` | ||||||||||||||||||
|
||||||||||||||||||
libraryStrategies=("polya" "stranded") | ||||||||||||||||||
chromosomesType=("autosomes" "x_and_y") | ||||||||||||||||||
for strategy in ${libraryStrategies[@]}; do | ||||||||||||||||||
bedtools coverage \ | ||||||||||||||||||
-a ${scratch_dir}/ucsc_cytoband.bed \ | ||||||||||||||||||
-b ${scratch_dir}/consensus_seg_with_status_losses.bed \ | ||||||||||||||||||
-f 0.75 \ | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you have sorted the The same thing applies to the two
Suggested change
|
||||||||||||||||||
> $loss_intersect_with_cytoband_file | ||||||||||||||||||
|
||||||||||||||||||
for chromosome_type in ${chromosomesType[@]}; do | ||||||||||||||||||
bedtools coverage \ | ||||||||||||||||||
-a ${scratch_dir}/ucsc_cytoband.bed \ | ||||||||||||||||||
-b ${scratch_dir}/consensus_seg_with_status_gains.bed \ | ||||||||||||||||||
-f 0.75 \ | ||||||||||||||||||
> $gain_intersect_with_cytoband_file | ||||||||||||||||||
|
||||||||||||||||||
Rscript --vanilla rna-expression-validation.R \ | ||||||||||||||||||
--annotated_cnv_file results/consensus_seg_annotated_cn_${chromosome_type}.tsv.gz \ | ||||||||||||||||||
--expression_file ${data_dir}/pbta-gene-expression-rsem-fpkm-collapsed.${strategy}.rds \ | ||||||||||||||||||
--independent_specimens_file $independent_specimens_file \ | ||||||||||||||||||
--metadata $histologies_file \ | ||||||||||||||||||
--goi_list $goi_file \ | ||||||||||||||||||
--filename_lead "consensus_seg_annotated_cn"_${chromosome_type}_${strategy} | ||||||||||||||||||
done | ||||||||||||||||||
done | ||||||||||||||||||
bedtools coverage \ | ||||||||||||||||||
-a ${scratch_dir}/ucsc_cytoband.bed \ | ||||||||||||||||||
-b ${scratch_dir}/consensus_seg_with_status.bed \ | ||||||||||||||||||
-f 0.75 \ | ||||||||||||||||||
> $callable_intersect_with_cytoband_file | ||||||||||||||||||
|
||||||||||||||||||
# if we want to process the CNV data from the original callers | ||||||||||||||||||
# (e.g., CNVkit, ControlFreeC) | ||||||||||||||||||
if [ "$RUN_ORIGINAL" -gt "0" ]; then | ||||||||||||||||||
|
||||||||||||||||||
# Prep the CNVkit data | ||||||||||||||||||
Rscript --vanilla -e "rmarkdown::render('01-add-ploidy-cnvkit.Rmd', clean = TRUE)" | ||||||||||||||||||
|
||||||||||||||||||
# Run annotation step for CNVkit | ||||||||||||||||||
Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
--cnv_file ${scratch_dir}/cnvkit_with_status.tsv \ | ||||||||||||||||||
--gtf_file $gtf_file \ | ||||||||||||||||||
--metadata $histologies_file \ | ||||||||||||||||||
--filename_lead "cnvkit_annotated_cn" \ | ||||||||||||||||||
--seg | ||||||||||||||||||
|
||||||||||||||||||
# Run annotation step for ControlFreeC | ||||||||||||||||||
Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
--cnv_file ${data_dir}/pbta-cnv-controlfreec.tsv.gz \ | ||||||||||||||||||
--gtf_file $gtf_file \ | ||||||||||||||||||
--metadata $histologies_file \ | ||||||||||||||||||
--filename_lead "controlfreec_annotated_cn" \ | ||||||||||||||||||
--controlfreec | ||||||||||||||||||
|
||||||||||||||||||
filenameLead=("cnvkit_annotated_cn" "controlfreec_annotated_cn") | ||||||||||||||||||
for filename in ${filenameLead[@]}; do | ||||||||||||||||||
for strategy in ${libraryStrategies[@]}; do | ||||||||||||||||||
for chromosome_type in ${chromosomesType[@]}; do | ||||||||||||||||||
Rscript --vanilla rna-expression-validation.R \ | ||||||||||||||||||
--annotated_cnv_file results/${filename}_${chromosome_type}.tsv.gz \ | ||||||||||||||||||
--expression_file ${data_dir}/pbta-gene-expression-rsem-fpkm-collapsed.${strategy}.rds \ | ||||||||||||||||||
--independent_specimens_file $independent_specimens_file \ | ||||||||||||||||||
--metadata $histologies_file \ | ||||||||||||||||||
--goi_list $goi_file \ | ||||||||||||||||||
--filename_lead ${filename}_${chromosome_type}_${strategy} | ||||||||||||||||||
done | ||||||||||||||||||
done | ||||||||||||||||||
done | ||||||||||||||||||
|
||||||||||||||||||
fi | ||||||||||||||||||
# # Run annotation step for consensus file | ||||||||||||||||||
# Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
# --cnv_file ${scratch_dir}/consensus_seg_with_status.tsv \ | ||||||||||||||||||
# --gtf_file $gtf_file \ | ||||||||||||||||||
# --metadata $histologies_file \ | ||||||||||||||||||
# --filename_lead "consensus_seg_annotated_cn" \ | ||||||||||||||||||
# --seg | ||||||||||||||||||
# | ||||||||||||||||||
# libraryStrategies=("polya" "stranded") | ||||||||||||||||||
# chromosomesType=("autosomes" "x_and_y") | ||||||||||||||||||
# for strategy in ${libraryStrategies[@]}; do | ||||||||||||||||||
# | ||||||||||||||||||
# for chromosome_type in ${chromosomesType[@]}; do | ||||||||||||||||||
# | ||||||||||||||||||
# Rscript --vanilla rna-expression-validation.R \ | ||||||||||||||||||
# --annotated_cnv_file results/consensus_seg_annotated_cn_${chromosome_type}.tsv.gz \ | ||||||||||||||||||
# --expression_file ${data_dir}/pbta-gene-expression-rsem-fpkm-collapsed.${strategy}.rds \ | ||||||||||||||||||
# --independent_specimens_file $independent_specimens_file \ | ||||||||||||||||||
# --metadata $histologies_file \ | ||||||||||||||||||
# --goi_list $goi_file \ | ||||||||||||||||||
# --filename_lead "consensus_seg_annotated_cn"_${chromosome_type}_${strategy} | ||||||||||||||||||
# done | ||||||||||||||||||
# done | ||||||||||||||||||
# | ||||||||||||||||||
# # if we want to process the CNV data from the original callers | ||||||||||||||||||
# # (e.g., CNVkit, ControlFreeC) | ||||||||||||||||||
# if [ "$RUN_ORIGINAL" -gt "0" ]; then | ||||||||||||||||||
# | ||||||||||||||||||
# # Prep the CNVkit data | ||||||||||||||||||
# Rscript --vanilla -e "rmarkdown::render('01-add-ploidy-cnvkit.Rmd', clean = TRUE)" | ||||||||||||||||||
# | ||||||||||||||||||
# # Run annotation step for CNVkit | ||||||||||||||||||
# Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
# --cnv_file ${scratch_dir}/cnvkit_with_status.tsv \ | ||||||||||||||||||
# --gtf_file $gtf_file \ | ||||||||||||||||||
# --metadata $histologies_file \ | ||||||||||||||||||
# --filename_lead "cnvkit_annotated_cn" \ | ||||||||||||||||||
# --seg | ||||||||||||||||||
# | ||||||||||||||||||
# # Run annotation step for ControlFreeC | ||||||||||||||||||
# Rscript --vanilla 03-prepare-cn-file.R \ | ||||||||||||||||||
# --cnv_file ${data_dir}/pbta-cnv-controlfreec.tsv.gz \ | ||||||||||||||||||
# --gtf_file $gtf_file \ | ||||||||||||||||||
# --metadata $histologies_file \ | ||||||||||||||||||
# --filename_lead "controlfreec_annotated_cn" \ | ||||||||||||||||||
# --controlfreec | ||||||||||||||||||
# | ||||||||||||||||||
# filenameLead=("cnvkit_annotated_cn" "controlfreec_annotated_cn") | ||||||||||||||||||
# for filename in ${filenameLead[@]}; do | ||||||||||||||||||
# for strategy in ${libraryStrategies[@]}; do | ||||||||||||||||||
# for chromosome_type in ${chromosomesType[@]}; do | ||||||||||||||||||
# Rscript --vanilla rna-expression-validation.R \ | ||||||||||||||||||
# --annotated_cnv_file results/${filename}_${chromosome_type}.tsv.gz \ | ||||||||||||||||||
# --expression_file ${data_dir}/pbta-gene-expression-rsem-fpkm-collapsed.${strategy}.rds \ | ||||||||||||||||||
# --independent_specimens_file $independent_specimens_file \ | ||||||||||||||||||
# --metadata $histologies_file \ | ||||||||||||||||||
# --goi_list $goi_file \ | ||||||||||||||||||
# --filename_lead ${filename}_${chromosome_type}_${strategy} | ||||||||||||||||||
# done | ||||||||||||||||||
# done | ||||||||||||||||||
# done | ||||||||||||||||||
# | ||||||||||||||||||
# fi |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sure the bed tables are sorted before output.