Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update manta FILTER=='PASS' Part1 : consensus cnv file generation #1114

Merged
merged 3 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions analyses/copy_number_consensus_call/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,12 @@ The per-sample pipeline revolves around the use of Snakemake to run analysis for
3) Create a `config_snakemake.yaml` that contains all of the samples names to run the Snakemake pipeline
4) Run the Snakemake pipeline to perform analysis **per sample**.
5) Filter for any CNVs that are over a certain **SIZE_CUTOFF** (default 3000 bp)
6) Filter for any **significant** CNVs called by Freec (default pval = 0.01)
7) Filter out any CNVs that overlap 50% or more with **Immunoglobulin, telomeric, centromeric, seg_dup regions** as found in the file `ref/cnv_excluded.bed`
8) Merge any CNVs of the same sample and call method if they **overlap or within 10,000 bp** (We consider CNV calls within 10,000 bp the same CNV)
9) Reformat the columns of the files (So the info are easier to read)
10) **Call consensus** by comparing CNVs from 2 call methods at a time.
6) Filter for any **significant** CNVs called by Freec (default pval = 0.01)
7) Filter to keep manta calls that **PASS** all filters
8) Filter out any CNVs that overlap 50% or more with **Immunoglobulin, telomeric, centromeric, seg_dup regions** as found in the file `ref/cnv_excluded.bed`
9) Merge any CNVs of the same sample and call method if they **overlap or within 10,000 bp** (We consider CNV calls within 10,000 bp the same CNV)
10) Reformat the columns of the files (So the info are easier to read)
11) **Call consensus** by comparing CNVs from 2 call methods at a time.

Since there are 3 callers, there were 3 comparisons: `manta-cnvkit`, `manta-freec`, and `cnvkit-freec`. If a CNV from 1 caller **overlaps 50% or more** with at least 1 CNV from another caller, the common region of the overlapping CNV would be the new CONSENSUS CNV.

Expand Down
6 changes: 3 additions & 3 deletions analyses/copy_number_consensus_call/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,10 @@ rule manta_filter:
## the first awk also filters out for CNV length
## The sort command sorts the first digit of chromosome number numerically
## The last pipe is to introduce tab into the file and output file name.
"""awk '$6~/DEL/ {{if ($5 > {params.SIZE_CUTOFF}) {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
"""awk '$6~/DEL/ {{if ($5 > {params.SIZE_CUTOFF} && $11 == 'PASS') {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
""" | sort -k1,1 -k2,2n """
""" | tr [:blank:] '\t' > {output.manta_del} && """
"""awk '$6~/DUP/ {{if ($5 > {params.SIZE_CUTOFF}) {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
"""awk '$6~/DUP/ {{if ($5 > {params.SIZE_CUTOFF} && $11 == 'PASS') {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
""" | sort -k1,1 -k2,2n """
""" | tr [:blank:] '\t' > {output.manta_dup}"""

Expand Down Expand Up @@ -330,4 +330,4 @@ rule make_segfile:
" -i {input.consensus}"
" -n {input.neutral}"
" -u {input.uncalled}"
" -o {output}"
" -o {output}"
Loading