Generate CNV exclusion list #467

jashapiro · 2020-01-22T19:08:35Z

Purpose/implementation Section

What scientific question is your analysis addressing?

The analysis of CNV consensus files takes advantage of a file of regions that are to be excluded due to expected (and previously observed) high levels of false positives. These include regions such as telomeres, centromeres, and known segmental duplications.

What was your approach?

I split up the various categories of excluded regions into separate files for ease of identification and modification. These are:

ref/centromeres.bed
ref/heterochromatin.bed
ref/immunoglobulin_regions.bed
ref/segmental_dups.bed
ref/telomeres.bed

The origins of these files, with code where appropriate, are in scripts/prepare_blacklist_files.sh

The files are then merged with a new rule in the Snakefile to generate ref/cnv_excluded.bed which can be used downstream.

What GitHub issue does your pull request address?

closes #438

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

This file differs slightly from the previously included file, provided by @fingerfen, mostly I think in the handling of segmental duplication regions. There seem to be some broader regions that are excluded, but I could not find references for why those had been excluded.

Note however, that there do not appear to be major changes in the final CNV regions, though there are some effects at the margins.

Is there anything that you want to discuss further?

Do we need to programmatically generate every region, or is it okay that the telomeres and IG regions are
simply included as their own files?

How concerned should we be be about changes in the final set of calls resulting from this change? Can we adjust the generation to better match the previous set (included here as ref/bad_chromosomal_seg_merged.bed)

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

Removing `src` directory to unnest `scripts` and adding `ref` directory for genomic info files.

…cnv-blacklist

Link and script to process downloaded file for segmental duplciations.

@hongboxie

These regions are the ones defined by @hongboxie here: AlexsLemonade#438 (comment) Converted from hg18 to hg38

Note that ordering has changed, but the actual differences between these files should be relatively small other than that. There are changes to the cnv_consensus.tsv file where segments that are not contained within the defined CNV are discarded but might have been retained before.

…erlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com>

…e-cnv-blacklist

jaclyn-taroni · 2020-01-22T21:29:38Z

Do we need to programmatically generate every region, or is it okay that the telomeres and IG regions are simply included as their own files?

It took me awhile to figure out that the origin of these files was described as comments in the shell script. So my vote would be yes let's programmatically generate these.

jashapiro · 2020-01-22T22:41:49Z

Do we need to programmatically generate every region, or is it okay that the telomeres and IG regions are simply included as their own files?

It took me awhile to figure out that the origin of these files was described as comments in the shell script. So my vote would be yes let's programmatically generate these.

I can do the telomeres... I have no idea how to get the IG regions, unfortunately.

jashapiro · 2020-01-24T16:49:00Z

In my work on #476, I have substantially updated the README to include more information on the creation of exclusion regions, making it hopefully easier to find. If those updates are a good start, it might make sense to make further changes in that branch?

jaclyn-taroni

LGTM - I agree that the documentation changes on #476 look good and any changes can be continued over there.

Duong and others added 30 commits December 18, 2019 03:51

add to Snakefile

3f55855

resolve conflict

9a923a3

Merge remote-tracking branch 'upstream/master'

d38289c

updating fork

3a20aa0

Merge remote-tracking branch 'upstream/master'

d3d6431

changed output path and name

305bbbf

update Snakefile to master

eec2ffc

implement segmean

5c98fda

implement segmean

cc4eff6

add result file

fa23995

resolve

6fc6b7a

add result files

902abbb

add trailing line

8bf98d3

fix .py

264798c

change Snakefile comment

96490f3

change README.md

b242642

change README.md

9df166b

Updates to file organization

5d2fd04

Removing `src` directory to unnest `scripts` and adding `ref` directory for genomic info files.

Merge branch 'jashapiro/reorg-cnv-consensus' into jashapiro/generate-…

d5d2a72

…cnv-blacklist

add alternative segdup generation

e4c66b4

Link and script to process downloaded file for segmental duplciations.

Updates to blacklist generation

44eb15f

Add IG regions

253ff4b

These regions are the ones defined by @hongboxie here: AlexsLemonade#438 (comment) Converted from hg18 to hg38

Add step to potentially fix overlapping dup del segments.

88385f5

Notebook to look at consensus calls for overlaps

92a08e8

Add overlap pruning

1bb834c

update readme

86072fe

Merge branch 'master' into jashapiro/fix_cnv_overlaps

3047346

Add telomere definition file

4f3f1ef

Update blacklist generation script

acee89e

jashapiro and others added 11 commits January 22, 2020 09:50

Remove accidentally included notebook

3cd5d56

Tried to clarify complicated bedtools step.

9a06e5c

Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_ov…

9acf2c5

…erlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com>

Update analyses/copy_number_consensus_call/scripts/remove_dup_NULL_ov…

9a38ace

…erlap_entries.py Co-Authored-By: Candace Savonen <cansav09@gmail.com>

Add more clarifying comments

19e30da

Merge jashapiro/fix_cnv_overlaps

4030c83

Merge remote-tracking branch 'upstream/master' into jashapiro/generat…

8eb447e

…e-cnv-blacklist

Add full exclusion list and remove outdated files

8c71b0b

Update readmes

e7a350e

Updated output files.

82b869e

Re-add previous blacklist

673947f

jashapiro marked this pull request as ready for review January 22, 2020 19:35

jashapiro and others added 3 commits January 22, 2020 15:17

More descriptive excluded file name

d433218

Update filename

1a080b3

Merge branch 'master' into jashapiro/generate-cnv-blacklist

9d257bb

jaclyn-taroni self-requested a review January 23, 2020 00:48

jashapiro mentioned this pull request Jan 24, 2020

Include neutral changes in cnv consensus .seg file #476

Merged

3 tasks

Merge branch 'master' into jashapiro/generate-cnv-blacklist

fdbc60c

Merge branch 'master' into jashapiro/generate-cnv-blacklist

ce1086e

jaclyn-taroni approved these changes Jan 25, 2020

View reviewed changes

jaclyn-taroni merged commit 844a9e4 into AlexsLemonade:master Jan 25, 2020

jashapiro mentioned this pull request Jan 28, 2020

Update consensus results to v14 data #480

Merged

2 tasks

jaclyn-taroni mentioned this pull request Jan 29, 2020

Discussion: possible improvements for copy number processing / analyses #486

Open

cansavvy mentioned this pull request Jan 30, 2020

Updated analysis: Chromosomal Instability Tweaks #487

Closed

4 tasks

jashapiro deleted the jashapiro/generate-cnv-blacklist branch April 11, 2021 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate CNV exclusion list #467

Generate CNV exclusion list #467

jashapiro commented Jan 22, 2020 •

edited

Loading

jaclyn-taroni commented Jan 22, 2020

jashapiro commented Jan 22, 2020

jashapiro commented Jan 24, 2020

jaclyn-taroni left a comment

Generate CNV exclusion list #467

Generate CNV exclusion list #467

Conversation

jashapiro commented Jan 22, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Documentation Checklist

jaclyn-taroni commented Jan 22, 2020

jashapiro commented Jan 22, 2020

jashapiro commented Jan 24, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jashapiro commented Jan 22, 2020 •

edited

Loading