add cnv interpretation #216

jharenza · 2019-11-03T19:12:41Z

Purpose/implementation

What scientific question is your analysis addressing?
What was your approach?
If this is not adding an analysis, describe your changes in this section.
CNVkit and ControlFreeC copy number outputs are not directly comparable. Updating this in the README.

Issue

What GitHub issue does your pull request address?
#182

Directions for reviewers

Tell potential reviewers what kind of feedback you are soliciting.
Are there particular areas that need a closer look?
Is there something you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Is this clear or does this need more explanation?

Results

If your pull request includes code that produces scientific results, please summarize the results here.
This can help facilitate discussion around interpretation.
Please state what kinds of results are included (e.g., table, figure).

CNVkit and ControlFreeC copy number outputs are not directly comparable. Updating this in the README.

jharenza · 2019-11-03T19:21:22Z

hmm not sure if that markdown table is showing up properly.

jaclyn-taroni · 2019-11-03T19:22:22Z

I will give formatting it a shot right now.

jaclyn-taroni · 2019-11-03T19:24:47Z

It needed newlines on both sides.

jharenza · 2019-11-03T19:25:15Z

ahh ok good to know, thanks!

jaclyn-taroni · 2019-11-03T19:32:39Z

README.md

@@ -92,6 +92,32 @@ The release notes for each release are provided in the `release-notes.md` file t
 * Somatic Copy Number Variant (CNV) data are provided in a modified [SEG format](https://software.broadinstitute.org/software/igv/SEG) for each of the [applied software packages](https://alexslemonade.github.io/OpenPBTA-manuscript/#somatic-copy-number-variant-calling).
  * The CNVkit SEG file has an additional column `copy.num` to denote copy number of each segment, derived from the CNS file output of the algorithm described [here](https://cnvkit.readthedocs.io/en/stable/fileformats.html).
  * The ControlFreeC TSV file is a merge of `*_CNVs` files produced from the algorithm, and columns are described [here](http://boevalab.inf.ethz.ch/FREEC/tutorial.html#OUTPUT).
+  * NOTE: The _copy number_ annotated in the CNVkit SEG file is annotated with respect to ploidy 2, however, the _copy number_ annotated in the ControlFreeC TSV file is annotated with respect to inferred ploidy from the algorithm, which is recorded in the `pbta_histologies.tsv` file. See the table below for examples of possible interpretations.
+
+| Ploidy | Copy Number | Gain/Loss Interpretation     |


One is supposed to look at the tumor_ploidy column in the pbta_histologies.tsv file and the copy number column in the ControlFreeC TSV, is that correct? Two thoughts on this:

Why not include the ploidy information in the ControlFreeC file since we sometimes add columns to files (e.g., copy.num in CNVkit SEG) so an analyst has something all in one spot?

If we can't/don't want to put ploidy in the ControlFreeC file, can we change the column names here to: tumor_ploidy in pbta-histologies.tsv, copy number in pbta-cnv-controlfreec.tsv.gz, Gain/Loss Interpretation

Yes, I was thinking it would make more sense to add it to the ControlFreeC file, but it also may be confusing because in many cases, it would contradict the genotype column. The ploidy in the clinical file is overall tumor ploidy, not the segment ploidy, so we can add it if you think it is useful, but maybe then we modify genotype to segment_genotype.

The table was also meant to be inclusive of CNVkit, which is why I didn't label the columns specifically.

On another note, it looks like we have gain/loss info in the ControlFreeC TSV file, so those should also help the user not rely solely on copy number. The challenging thing would be - what then defines a homozygous loss, because does that mean for ploidy 3, homozygous loss is all 3 copies? (I may have to dig more into my genetics for this answer). I didn't realize this was the case when writing up the info for #182.

Thoughts on all of this - easiest for the user? Maybe we default to just using loss broadly, rather than categorizing as homo/hemi, and confirm total copy loss with lack of RNA expression, use gain broadly, and set a cutoff for amplification? That way we are not relying on copy number numbers which require ploidy interpretation, but rather, the gain/loss calls for ControlFreeC?

I think we should add tumor_ploidy to the TSV file, and change genotype to segment_genotype.

I would lean toward using loss broadly, but I don't think that we should rely on RNA expression as the only indicator of total loss; we should indicate total loss based on the seg calls as well.

The reason I am hesitant to rely on RNA expression is that it is quite possible to have a complete loss of some exons while others remain present. This could result in total loss of functionality, while the RNA level might indicate expression of the gene, as some exons are being expressed. This would be rare, I'd expect, but I feel like it is worth highlighting discrepancies between RNA and genomic data.

This is definitely the case for ATRX, and we treat it a bit differently - we had been doing a coverage-based estimation of exons lost and reporting those as deletions and/or assessing SVs in this gene as a complement. However, the RNA-level loss is something we did in the past to ensure loss of genes' expression, especially in cases in which we know there should be complete loss (eg SMARCB1 in ATRT and some types of chordomas, CDKN2A/B in leukemias - not relevant here) and/or to ensure CN calls were generally lining up with expectations. It does get a bit complicated, I agree, trying to be broad, yet somehow cover all bases. In cases of hemizygous loss, I did a lot of manual inspection to ensure CN calls were accurate (which is not always the case).

jharenza · 2019-11-03T20:25:15Z

It needed newlines on both sides.

Dumb question and googling isn't super helpful - how did you do this? I tried <br>, <br/>, and \ they did not work.

jaclyn-taroni · 2019-11-03T20:48:16Z

Ah, as in needs a blank line before and after the table: https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/216/files#diff-04c6e90faac2675aa89e2176d2eec7d8R96 and https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/216/files#diff-04c6e90faac2675aa89e2176d2eec7d8R120

README.md

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

add cnv interpretation

caf1419

CNVkit and ControlFreeC copy number outputs are not directly comparable. Updating this in the README.

jharenza requested a review from jaclyn-taroni November 3, 2019 19:12

Jo Lynne Rokita added 2 commits November 3, 2019 14:19

fix table

5b33b79

remove extra line spaces

b3ab629

Spacing around CNV table

f167b16

jaclyn-taroni reviewed Nov 3, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

jharenza mentioned this pull request Nov 4, 2019

Planned release: v8 #219

Closed

Jo Lynne and others added 3 commits November 4, 2019 07:46

Update README.md

cfa1875

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

Merge branch 'master' into cnv-ploidy-explanation

021ade7

Merge branch 'master' into cnv-ploidy-explanation

7acff9f

jaclyn-taroni approved these changes Nov 4, 2019

View reviewed changes

jaclyn-taroni merged commit 262d15a into master Nov 4, 2019

jaclyn-taroni deleted the cnv-ploidy-explanation branch November 4, 2019 14:17

yuankunzhu mentioned this pull request Nov 4, 2019

V8 Release #223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cnv interpretation #216

add cnv interpretation #216

jharenza commented Nov 3, 2019

jharenza commented Nov 3, 2019

jaclyn-taroni commented Nov 3, 2019

jaclyn-taroni commented Nov 3, 2019

jharenza commented Nov 3, 2019

jaclyn-taroni Nov 3, 2019

jharenza Nov 3, 2019

jharenza Nov 3, 2019

jashapiro Nov 3, 2019

jharenza Nov 3, 2019 •

edited

Loading

jharenza commented Nov 3, 2019 •

edited

Loading

jaclyn-taroni commented Nov 3, 2019

add cnv interpretation #216

add cnv interpretation #216

Conversation

jharenza commented Nov 3, 2019

Purpose/implementation

Issue

Directions for reviewers

Results

jharenza commented Nov 3, 2019

jaclyn-taroni commented Nov 3, 2019

jaclyn-taroni commented Nov 3, 2019

jharenza commented Nov 3, 2019

jaclyn-taroni Nov 3, 2019

Choose a reason for hiding this comment

jharenza Nov 3, 2019

Choose a reason for hiding this comment

jharenza Nov 3, 2019

Choose a reason for hiding this comment

jashapiro Nov 3, 2019

Choose a reason for hiding this comment

jharenza Nov 3, 2019 • edited Loading

Choose a reason for hiding this comment

jharenza commented Nov 3, 2019 • edited Loading

jaclyn-taroni commented Nov 3, 2019

jharenza Nov 3, 2019 •

edited

Loading

jharenza commented Nov 3, 2019 •

edited

Loading