Proposed Analysis: Copy number consensus calls #128

jharenza · 2019-09-25T13:08:20Z

Scientific goals

What are the scientific goals of the analysis?
Create consensus calls from ControlFreeC and CNVKit

Proposed methods

What methods do you plan to use to accomplish the scientific goals?
Breakpoints will not perfectly overlap between algorithms, so the analyst will likely have to define a window for overlap of copy number alterations to deem consensus calls.

Required input data

What input data will you use for this analysis?
SEG files

Proposed timeline

What is the timeline for the analysis?
2 weeks

Relevant literature

If there is relevant scientific literature, put links to those items here.

xhb1991 · 2019-09-25T13:14:26Z

We will use additional methods to get CNV calls with a balanced sensitivity and specificity for the cohort.

jaclyn-taroni · 2019-09-25T13:16:42Z

Are you planning on tackling this @jharenza and @xiehongbo? If so, I will mark as in progress.

fingerfen · 2019-09-25T14:53:34Z

We have a pipeline, count me in as well!

xhb1991 · 2019-09-25T14:53:41Z

Yeah, we are tackling this.

jharenza · 2019-09-25T15:01:00Z

We have a pipeline, count me in as well!

@fingerfen - great! What is the pipeline and what inputs do you need? We may have to set you up on CAVATICA to run this.

xhb1991 · 2019-09-25T15:27:39Z

let talk about it today during our meeting.

jaclyn-taroni · 2019-11-01T20:37:53Z

Hi @jharenza @xiehongbo @fingerfen,

Do you have an idea of when we should expect the first pull request for this issue? I am also wondering if we know what the format of the output of this analysis will be now that the two callers have different file formats. This information will help us in development for issues like #6 and #186.

jharenza · 2019-11-01T23:32:05Z

Hi @fingerfen and @xiehongbo - we were able to finish the data releases to include the new CNVkit and ControlFreeC files, so now you are able to submit a pull request with your analysis. Is it possible to do this next week? @fingerfen can you also list here the columns you will have in your final file for @jaclyn-taroni ? Thanks!

hongboxie · 2019-11-07T17:45:27Z

We assume, that sample QC has been done by the sample noisy level.
here is what we did for summarizing consensus CNVs from three predictors of somatic CNs

We define two CNVs are the same event if the CNVs overlapping each other >50% reciprocally. We typically use much higher threshold for Germline CNVs (60-80%)
We took any CNVs that are identified by >=2 approaches (cnvkit,freec, etc).
currently we only summarize deletions (CN=0,1) or amplifications (CN>2).
We listed each consensus CNV by listing their Chromosome, Start Position, End Position, the original CNV identified by each CNV detection method.
Currently we took the average of the breakpoints from different methods as our consensus breakpoints of consensus CNVs.
We remove any CNVs that majority of the content overlapping centromere , telomere, IGLL regions , segmental duplications (>50%) (* note, we do not remove by removing them, but mark the fraction of overlapping those blacklist regions)

jashapiro · 2019-11-08T17:20:13Z

We assume, that sample QC has been done by the sample noisy level.

I do not understand what this is referring to. Looking at the manuscript, I am not clear on what QC steps were performed on CNV calls, if any. Are there standard QC steps that should be added to the CNV results, and/or documented in the manuscript? Perhaps @jharenza or @yuankunzhu can provide some insight here?

hongboxie · 2019-11-08T19:49:57Z

@jharenza I think we had this discussion before. Do you want to remove samples with extra high noisy levels, or keep every sample regardless. If we do want to remove samples with high noisy level, which "noisy sample" refers sample with high SD of Depth of coverage, we can do so. Otherwise, we can report CNVs from ALL samples. Up to your guys. I am fine with either way.

hongboxie · 2019-11-08T19:51:21Z

@jashapiro when you recover CNVs from given samples, do you remove "noisy" samples? If so, what is your practice to do so?

jashapiro · 2019-11-08T20:00:46Z

@hongboxie That makes sense. As I do not have the raw data, I can't see the raw coverage metrics, but after you brought up such QC, I went to look in the manuscript for details, and didn't find them mentioned. This is not my area of expertise, so I do not know what the standard practices are; I am just trying to understand the data as we work on some of the downstream analysis.

hongboxie · 2019-11-08T21:10:47Z

@jashapiro no worries! I am learning this topic from everyone as well. I am open to any suggestions.

jharenza · 2019-11-08T22:17:01Z

@hongboxie sorry just reading this now... We discussed with @fingerfen to remove any samples if they showed whole genome-gain. That was a measure of inaccurate CN calling. I think we also chose to use a cutoff of >2500 segments for noise to follow what the arrays used, but I also may recall that ControlFreeC might not have smoothed their segments as CNVkit did (ie collapse multiple into one), so they may have a larger number of segments than expected and this cutoff may not be good. @yuankunzhu do you remember when we redid the ControlFreeC TSV file, if this is smoothed?

No samples were removed when we provided the data. This was being done via #128. When I checked out the CNVkit seg, the sample call quality all looked reasonable in IGV, so this may only apply to controlFreeC.

hongboxie · 2019-11-11T14:29:17Z

We are perplexed by the outcome of Manta. There are multiple CNVs overlapping the same region. Currently we decide to merge all CNVs into one single consent event. There are something strange about Manta somatic CNV output.

hongboxie · 2019-11-11T14:31:51Z

We haven't had a chance to dig into the root of the problem.

hongboxie · 2019-11-19T13:57:18Z

About Manta:

we found many CNVs overlapping the same region in Manta's outcome;
It seems those should come from PE based CNV detection (I think)
3)Currently our consent is to merge those CNVs into one segment of CNV, and hoping other methods (CNVkit/FreeC) will help to eliminate any false positives.

hongboxie · 2019-11-19T14:03:40Z

About consensus CNV:

I think ploidy detection is subjective, based on current approaches, for instance, it is very hard to get a consensus when one predictor predict there are 30 copies of one segment, the other predictor claims a different number of ploidy.
We will present two types of CNVs: Amplification (ploity>2) or Deletion (ploidy<2). We created 3 additional columns where we display original outcome of each method overlapping this region as evidence. The original output of each predictor will carry the information of the ploidy, if it has it.

fingerfen · 2019-11-19T18:25:45Z

Pipeline_Visual_Example_on_chr7.pptx

@jharenza Attached is the ppt from Wednesday's presentation. Sorry for the delay.

jaclyn-taroni · 2019-11-22T21:39:01Z

Documenting the outcomes of the in person meeting this afternoon (@jaclyn-taroni @jashapiro @hongboxie @fingerfen):

The first pull request will consist of the python script that parses the different callers files into files for individual biospecimens + sets up the snakemake file. A step running this script will also get added to .circleci/config.yml
A second step running the snakemake pipeline will need to be added to .circleci/config.yml OR a shell script that 1. runs merged_to_individual_files.py and 2. runs the snakemake pipeline will need to get added and be the single entry in .circleci/config.yml for this analysis.
Subsequent pull requests will contain a single python script and the updates to the snakemake file that run that python script. That allows us to test each script in CI. It's also fine to include some of the bedtool steps along side the python script additions in these pull requests so long as the pull requests are not many (400+) lines of code.
To make the final file that will be included in the data download, we probably want a single file that contains both DEL and DUP events for all biospecimens.

jharenza · 2020-01-13T13:55:54Z

closed via #357 #349 #328 #313 #403 #288 #416

jharenza added the proposed analysis label Sep 25, 2019

jaclyn-taroni mentioned this issue Sep 25, 2019

Planned Analysis: Copy number plot showing recurrently amplified/deleted regions in different PBTA cancers #8

Closed

jaclyn-taroni added the in progress Someone is working on this issue, but feel free to propose an alternative approach! label Sep 25, 2019

jharenza mentioned this issue Oct 2, 2019

Planned Analysis: Oncoprint showing landscape of genetic lesions across PBTA. #6

Closed

jharenza mentioned this issue Oct 18, 2019

Addition of R script to compare CNV caller output #142

Merged

6 tasks

This was referenced Oct 25, 2019

Planned Data Release: v6 #146

Closed

Addition of script to produce oncoprint #176

Merged

jaclyn-taroni added the cnv Related to or requires CNV data label Oct 26, 2019

jaclyn-taroni mentioned this issue Oct 30, 2019

Proposed Analysis: map from SEG file to genes (and broader segments) #186

Closed

jaclyn-taroni mentioned this issue Nov 3, 2019

Retire cnv-comparison #215

Merged

jashapiro mentioned this issue Nov 6, 2019

Planned Analysis: Co-Occurence / Mutual Exclusivity #13

Closed

jaclyn-taroni mentioned this issue Nov 8, 2019

Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

Closed

jharenza mentioned this issue Nov 8, 2019

SMARCB1 deletions in ATRT with current SEG to gene mapping #217

Closed

2 tasks

fingerfen mentioned this issue Nov 21, 2019

Add snakemake to Docker #285

Closed

2 tasks

fingerfen mentioned this issue Nov 22, 2019

CNV consensus (1 of n): Split large files into small sample files #288

Merged

2 tasks

jaclyn-taroni mentioned this issue Nov 25, 2019

PR 1 of 2: Molecular Subtyping - ATRT (Data Prep) #284

Merged

8 tasks

fingerfen mentioned this issue Dec 5, 2019

CNV consensus (2 of n): Filter raw data of CNV call methods #313

Merged

2 tasks

This was referenced Dec 5, 2019

Proposed Analysis: Molecularly subtype chordomas #250

Closed

Documentation: focal-cn-file-preparation README #319

Closed

This was referenced Dec 11, 2019

CNV consensus (3 of n): Filter bad segments #328

Merged

CNV consensus (4 of 6): Restructure column #349

Merged

fingerfen mentioned this issue Dec 19, 2019

CNV consensus (5 of 6):Consensus call #357

Merged

3 tasks

jharenza mentioned this issue Dec 23, 2019

Planned data release: V13 #373

Closed

7 tasks

This was referenced Jan 4, 2020

CNV consensus (6 of 6): Merge consensus files and name columns #403

Merged

CNV consensus (7 of 6): Remove duplicated coordinates and NULLs #416

Merged

jharenza mentioned this issue Jan 9, 2020

Evaluation step TP53 classifier #385

Merged

3 tasks

jaclyn-taroni mentioned this issue Jan 10, 2020

Proposed Analysis: apply TP53 & NF1 classifiers to PBTA data #165

Closed

jharenza closed this as completed Jan 13, 2020

fingerfen mentioned this issue Jan 13, 2020

CNV consensus (8 of 6): Changing Step3 to use Bedtools Subtract #430

Merged

5 tasks

jashapiro mentioned this issue Jan 15, 2020

Reproducing copy number excluded regions #438

Closed

jaclyn-taroni mentioned this issue Jan 18, 2020

Updated analysis: rerun GISTIC with consensus SEG file #453

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Analysis: Copy number consensus calls #128

Proposed Analysis: Copy number consensus calls #128

jharenza commented Sep 25, 2019

xhb1991 commented Sep 25, 2019

jaclyn-taroni commented Sep 25, 2019

fingerfen commented Sep 25, 2019

xhb1991 commented Sep 25, 2019

jharenza commented Sep 25, 2019

xhb1991 commented Sep 25, 2019

jaclyn-taroni commented Nov 1, 2019

jharenza commented Nov 1, 2019 •

edited

Loading

hongboxie commented Nov 7, 2019

jashapiro commented Nov 8, 2019

hongboxie commented Nov 8, 2019

hongboxie commented Nov 8, 2019

jashapiro commented Nov 8, 2019

hongboxie commented Nov 8, 2019

jharenza commented Nov 8, 2019 •

edited

Loading

hongboxie commented Nov 11, 2019

hongboxie commented Nov 11, 2019

hongboxie commented Nov 19, 2019

hongboxie commented Nov 19, 2019

fingerfen commented Nov 19, 2019

jaclyn-taroni commented Nov 22, 2019

jharenza commented Jan 13, 2020

Proposed Analysis: Copy number consensus calls #128

Proposed Analysis: Copy number consensus calls #128

Comments

jharenza commented Sep 25, 2019

Scientific goals

Proposed methods

Required input data

Proposed timeline

Relevant literature

xhb1991 commented Sep 25, 2019

jaclyn-taroni commented Sep 25, 2019

fingerfen commented Sep 25, 2019

xhb1991 commented Sep 25, 2019

jharenza commented Sep 25, 2019

xhb1991 commented Sep 25, 2019

jaclyn-taroni commented Nov 1, 2019

jharenza commented Nov 1, 2019 • edited Loading

hongboxie commented Nov 7, 2019

jashapiro commented Nov 8, 2019

hongboxie commented Nov 8, 2019

hongboxie commented Nov 8, 2019

jashapiro commented Nov 8, 2019

hongboxie commented Nov 8, 2019

jharenza commented Nov 8, 2019 • edited Loading

hongboxie commented Nov 11, 2019

hongboxie commented Nov 11, 2019

hongboxie commented Nov 19, 2019

hongboxie commented Nov 19, 2019

fingerfen commented Nov 19, 2019

jaclyn-taroni commented Nov 22, 2019

jharenza commented Jan 13, 2020

jharenza commented Nov 1, 2019 •

edited

Loading

jharenza commented Nov 8, 2019 •

edited

Loading