Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

jharenza · 2019-07-14T14:25:51Z

We have generated CNV output from ControlFreeC and CNVKit, but are seeking individuals to determine consensus focal calls and/or identify additional algorithms we can run to instill high confidence in focal CNV calls from the WGS dataset.

cgreene · 2019-07-14T14:43:27Z

After AlexsLemonade/OpenPBTA-manuscript#15 is approved and merged, can you write up the CNV methods and file a PR into that subsection so that we can link folks to the current version of the processing code?

It may change in the future, but then we'll have an accurate manuscript-ready description of what was done.

jharenza · 2019-07-24T13:03:47Z

This machine learning publication may help us with CN true positives:

jharenza · 2019-07-24T13:04:08Z

After AlexsLemonade/OpenPBTA-manuscript#15 is approved and merged, can you write up the CNV methods and file a PR into that subsection so that we can link folks to the current version of the processing code?

It may change in the future, but then we'll have an accurate manuscript-ready description of what was done.

Yes - will work on getting this filled in by the harmonization team.

gonzolgarcia · 2019-07-30T15:31:23Z

Integrated CNV and SV analyses and chromothripsis.

The proposed analyses broadly addresses the prevalence and functional impact of structural variation across brain tumors. It is important to note that copy number variations are essentially a subset of structural variants and as such, both CNV and SV calls are highly overlapping and complementary and should be studied together. I am effectively proposing to merge #27 and #28 issues.

In order to integrate CNV calls and SV calls we focus on breakpoint co-locallization, more details in the manuscript: https://www.biorxiv.org/content/10.1101/572248v3

Chromothripsis is a catastrophic one time event involving multiple breakpoints and rearrangements of localized regions in the genome. As opposed to chromoplexia, which involve gradually acquired structural variations. Chromothripsis can be identified by a pattern of oscillating copy number states and concomitant structural variants that allow walking through the newly formed chromosome. In practical terms, It can be identified as regions of abnormally high number of CNVs and SVs.
Different available methods; all of which have limitations: ShatterSeek (https://github.com/parklab/ShatterSeek), Shatterproof (https://metacpan.org/release/SGOVIND/Shatterproof-0.13) & No-Name (https://www.biorxiv.org/content/10.1101/572248v3)(Focused on regions which SV density is 2 * std. dev above the average of each sample)

The input format for developing downstream analyses are:

CNV segmentation data:
SampleId,
chromosome,
Start,
End,
num_probes (depreciated, from SNP array format),
Segment_Mean (log T/N )

Allele specific CNV (optional; defining regions of LoH and allelic imbalance)
SampleId,
chromosome,
Start,
End,
BAF_mean
Call (LOH or AI)

SV calls file content: (already filtered by Somatic Score; no need to be annotated)
SampleId,
Chromosome-origin,
Start-origin,
End-origin
Chromosome-destination,
Start-destination,
End-destination,
sv_type: DEL, DUP, TRA and INV (often divided in head-to-head and tail-to-tail)

Some proposed readouts and output analyses

Structural variation.

A measure of chromosomal instability (CIN) burden (density of breakpoints per Mb; similar to tumor mutational burden, TMB) and a plot by tumor type representing CIN burden (this could be compared to TMB).
Recurrently altered genes (perhaps integrated in an Oncoprint with SNV?)
For the oncoprint categories:
- Amplification/tandem-duplication
- Deep deletion/deletion
- Other Structural variation: Inversion, translocations
Focus on novel findings… If some newly recurrently altered gene arises will analyze in depth

Chromothripsis:
4) A barplot with the frequency of chromothripsis prevalence by tumor subtype
5) A few circus plots with examples of chromothripsis
6) association of chromothripsis with other somatic alterations (i.e. TP53 status)

Survival analyses (probably addressed in issue #18)
7) multivariate analyses including clinical variables as well as overall TMB and chromosomal instability burden and chromothripsis.

jharenza · 2019-07-30T15:49:11Z

merged #27 and #28 here per @gonzolgarcia's request

gonzolgarcia · 2019-09-19T18:22:40Z

Issue with lumpy data

As I am trying to filter somatic SVs from the table I realized that the evidence columns "Tumor" and "Normal" are switched.

In addition, there is no somatic score and haven't found much guidelines for somatic filtering of tumor/normal lumpy results. I will be considering this: arq5x/lumpy-sv#268

jharenza · 2019-09-20T19:45:25Z

Thanks, @gonzolgarcia! You are right, the T/N columns are swapped - we will fix this in V5 release coming next week.

guru-yang · 2019-10-07T15:30:08Z

The Yang Lab will perform analysis on chromothripsis.

gonzolgarcia · 2019-10-07T17:48:00Z

The Yang Lab will perform analysis on chromothripsis.

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

jaclyn-taroni · 2019-10-07T18:18:24Z

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

@gonzolgarcia are you planning to generate SV consensus calls?

gonzolgarcia · 2019-10-07T18:46:36Z

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

@gonzolgarcia are you planning to generate SV consensus calls?

Before getting a consensus, lumpy requires somatic filtering. It would be nice to have this added to next release

jharenza · 2019-10-07T19:47:08Z

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

@gonzolgarcia are you planning to generate SV consensus calls?

Before getting a consensus, lumpy requires somatic filtering. It would be nice to have this added to next release

@guru-yang - do you have any experience with somatic filtering of LUMPY SVs? The comment referred to here suggests the following:

Run SVTyper - docker
Filter for somatic calls:
a) keep non-reference SVs in the tumor;
b) keep SVs which have no alternate depth (AO==0) in normal;
c) keep SVs with sufficient depth in the normal (RO>~7)

guru-yang · 2019-10-07T20:27:55Z

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

@gonzolgarcia are you planning to generate SV consensus calls?

Before getting a consensus, lumpy requires somatic filtering. It would be nice to have this added to next release

@guru-yang - do you have any experience with somatic filtering of LUMPY SVs? The comment referred to here suggests the following:
1. Run SVTyper - docker

2. Filter for somatic calls:
   a) keep non-reference SVs in the tumor;
   b) keep SVs which have no alternate depth (AO==0) in normal;
   c) keep SVs with sufficient depth in the normal (RO>~7)

We haven't used LUMPY at all. The filtering steps sounds reasonable. Based on my experience, Manta alone might be good enough for SV calling.

gonzolgarcia · 2019-10-07T21:06:46Z

l

Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy)
This dataset still require some further processing and filtering

@gonzolgarcia are you planning to generate SV consensus calls?

Before getting a consensus, lumpy requires somatic filtering. It would be nice to have this added to next release

@guru-yang - do you have any experience with somatic filtering of LUMPY SVs? The comment referred to here suggests the following:
1. Run SVTyper - docker

2. Filter for somatic calls:
   a) keep non-reference SVs in the tumor;
   b) keep SVs which have no alternate depth (AO==0) in normal;
   c) keep SVs with sufficient depth in the normal (RO>~7)
We haven't used LUMPY at all. The filtering steps sounds reasonable. Based on my experience, Manta alone might be good enough for SV calling.

You're probably right and manta alone + cnvkit should be enough for Shatterseek?

guru-yang · 2019-10-07T22:13:37Z

You're probably right and manta alone + cnvkit should be enough for Shatterseek?

Should be enough.

jharenza · 2019-10-08T12:51:22Z

Great! @guru-yang and @gonzolgarcia - you can plan to use Manta + CNVkit for Shatterseek and then we can work on a filtered lumpy data file for release in the next few weeks for general recurrent SV analysis.

jharenza · 2019-10-25T13:20:15Z

@guru-yang and @gonzolgarcia as an update, we are going to remove LUMPY from the release. SVTyper processing is very long per sample (>10 hours), and will require some benchmarking for filtering, which we have de-prioritized in favor of benchmarking copy number. You have both said Manta is fine, so we will drop it. We will have a data release with new CN results coming next week #146, so please let us know if you need help with creating PRs!

guru-yang · 2019-10-25T14:13:28Z

@jharenza Thanks for the update. I am wondering how to get sample metadata. We are able to get gender, age at diagnose, tumor type from Kids First data portal. In order to perform survival analysis, age at last follow up would be needed. Do you know how to get that information? Are there any other information available for the patients, or their parents, such as smoking, alcohol consumption of the parents?

cgreene · 2019-10-25T14:15:53Z

@guru-yang : have you examined the metadata available in the files associated with this project? Once you do, could you file a new issue noting anything that's missing that you'd need for your analysis? Thanks!

guru-yang · 2019-10-25T14:33:03Z

@cgreene I am able to find overall survival in pedcbioportal. Thanks.

jaclyn-taroni · 2019-10-25T14:41:18Z

Hi @guru-yang - overall survival, gender, age at diagnosis, and tumor type are all available in the pbta-histologies.tsv file that are part of the data files that are obtained by running the download-data.sh script.

We need people to use that file when putting together their analyses because that ensures that different contributors that are working independently are using the same information across their analyses (e.g., the same overall survival values). If there are additional fields you would like to see in the pbta-histologies.tsv file, please file a new issue requesting that information. Thank you!

jharenza · 2019-10-25T14:46:54Z

@jharenza Thanks for the update. I am wondering how to get sample metadata. We are able to get gender, age at diagnose, tumor type from Kids First data portal. In order to perform survival analysis, age at last follow up would be needed. Do you know how to get that information? Are there any other information available for the patients, or their parents, such as smoking, alcohol consumption of the parents?

@guru-yang as @jaclyn-taroni mentioned, the survival is in the provided histologies file in the data download. It is better to use this file, as we have further categorized tumors and provided additional data not in the KF portal. We do not have age at last followup in the file currently, but it can be added in the release due next week. Can you please file an issue for that? We have no parental information available, but if there are other things you would like to see from patients, you can also ask in an issue and I can check whether we have the info available.

guru-yang · 2019-10-25T14:51:04Z

@jaclyn-taroni @jharenza I see. Thanks a lot. What about smoking and alcohol usage for the probands? I don't expect smokers in pediatric cohort. Just curious.

cgreene · 2019-10-25T14:52:27Z

@guru-yang : please file a new github issue with requests for metadata so that we can keep this issue, currently titled "Planned Analysis: Integrated CNV and SV analyses and chromothripsis" on that topic. Thanks!

jharenza · 2019-11-01T23:35:05Z

Hi @gonzolgarcia and @guru-yang! When do you think you will be able to file a pull request with either of your analyses? Thanks!

guru-yang · 2019-11-02T02:54:47Z

@jharenza We have made some progress. Is there a regular conference call or similar to share results among the group? Or everything is through github?

jaclyn-taroni · 2019-11-02T19:24:44Z

Hi @guru-yang, great to hear! We encourage you to file pull requests adding the code used to generate results as you have them. The analysis does not need to be complete before getting added to the repository. We have a pull request template with a section for summarizing results to facilitate discussion. You can join the Cancer Data Science Slack #open-pbta channel (more information here) if you have questions about the pull request model that are better answered in real-time.

cgreene · 2019-11-03T15:07:04Z

I will echo @jaclyn-taroni and @jharenza : please file pull requests adding code as you are writing it. It is much harder to integrate a large amount of code after it is entirely written. Thanks!

guru-yang · 2019-11-04T04:10:01Z

@jharenza @jaclyn-taroni @cgreene Will try to do that soon. I am traveling this week. One quick question, we have seen quite some patients with more than one tumors sequenced. When working on variants, is there a particular strategy to handle these tumors? Such as randomly pick one?

jashapiro · 2019-11-04T11:54:18Z

As of the v7 release, we now provide lists of independent specimens (one tumor per individual) that we would like analyses to use. These are randomly selected, as you suggest, but this allows everyone to use consistent sets. See the bottom of the Data Formats section of the README for descriptions of those files.

guru-yang · 2019-11-08T16:29:16Z

I noticed in some samples the CNV calls from two algorithms are quite different. I wonder what's the plan going forward. It seems to me generating a consensus CNV call is not easy.

jaclyn-taroni · 2019-11-08T16:32:35Z

Hi @guru-yang - have you taken a look at the copy number consensus issue: #128?

gonzolgarcia · 2019-11-11T16:53:26Z

Hello everyone, I wanted to apologize for my lack of contribution to this issue, which I proposed initially. Unfortunately the requirements of my new position at Mount Sinai have let me with very little time bandwidth. For the time being I cannot guaranty that I will contributing steadily to this issue. However, I'd be happy to provide support if still needed as I am working on developing new tools for the integrated analysis of CNVs and structural variations. Best regards to everyone.

jaclyn-taroni · 2020-01-02T19:47:00Z

I filed two more focused issues based on what analyses are in progress vs. those that are not currently accounted for: #393 and #394

Closing this.

jharenza added the good first issue Good for newcomers label Jul 26, 2019

jharenza mentioned this issue Jul 30, 2019

Planned Analysis: Somatic Structural Variant and Chromothripsis Analysis #28

Closed

jharenza changed the title ~~Planned Analysis: WGS Copy Number Analysis~~ Planned Analysis: Integrated CNV and SV analyses and chromothripsis Jul 30, 2019

jharenza added the in progress Someone is working on this issue, but feel free to propose an alternative approach! label Jul 30, 2019

jaclyn-taroni removed the good first issue Good for newcomers label Aug 15, 2019

cansavvy mentioned this issue Aug 19, 2019

Planned Analysis: Tumor Mutation Burden #3

Closed

jaclyn-taroni mentioned this issue Sep 20, 2019

Planned data release: V5 #121

Closed

jaclyn-taroni mentioned this issue Oct 7, 2019

Planned Data Release: v6 #146

Closed

guru-yang mentioned this issue Oct 25, 2019

Request metadata: patient age at last followup, smoking history, alcohol usage #177

Closed

jaclyn-taroni added cnv Related to or requires CNV data sv Related to or requires SV data labels Oct 26, 2019

yangyangclover mentioned this issue Nov 20, 2019

Process sv file #283

Merged

2 tasks

cansavvy mentioned this issue Dec 4, 2019

Planned Analysis: Survival Analysis across PBTA #18

Closed

This was referenced Jan 2, 2020

Proposed Analysis: Chromothripsis analysis with ShatterSeek, SV signatures #393

Closed

Proposed Analysis: chromosomal instability burden, recurrently altered genes #394

Closed

jaclyn-taroni closed this as completed Jan 2, 2020

jaclyn-taroni mentioned this issue Jan 3, 2020

Proposed Analysis: visualization of CNV and SV data with Circos plot #397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

jharenza commented Jul 14, 2019 •

edited

Loading

cgreene commented Jul 14, 2019

jharenza commented Jul 24, 2019

jharenza commented Jul 24, 2019

gonzolgarcia commented Jul 30, 2019 •

edited

Loading

jharenza commented Jul 30, 2019

gonzolgarcia commented Sep 19, 2019

jharenza commented Sep 20, 2019

guru-yang commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

jaclyn-taroni commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

jharenza commented Oct 7, 2019

guru-yang commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

guru-yang commented Oct 7, 2019

jharenza commented Oct 8, 2019

jharenza commented Oct 25, 2019

guru-yang commented Oct 25, 2019

cgreene commented Oct 25, 2019

guru-yang commented Oct 25, 2019

jaclyn-taroni commented Oct 25, 2019

jharenza commented Oct 25, 2019 •

edited

Loading

guru-yang commented Oct 25, 2019

cgreene commented Oct 25, 2019

jharenza commented Nov 1, 2019

guru-yang commented Nov 2, 2019

jaclyn-taroni commented Nov 2, 2019

cgreene commented Nov 3, 2019

guru-yang commented Nov 4, 2019

jashapiro commented Nov 4, 2019

guru-yang commented Nov 8, 2019

jaclyn-taroni commented Nov 8, 2019

gonzolgarcia commented Nov 11, 2019

jaclyn-taroni commented Jan 2, 2020

Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27

Comments

jharenza commented Jul 14, 2019 • edited Loading

cgreene commented Jul 14, 2019

jharenza commented Jul 24, 2019

jharenza commented Jul 24, 2019

gonzolgarcia commented Jul 30, 2019 • edited Loading

jharenza commented Jul 30, 2019

gonzolgarcia commented Sep 19, 2019

jharenza commented Sep 20, 2019

guru-yang commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

jaclyn-taroni commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

jharenza commented Oct 7, 2019

guru-yang commented Oct 7, 2019

gonzolgarcia commented Oct 7, 2019

guru-yang commented Oct 7, 2019

jharenza commented Oct 8, 2019

jharenza commented Oct 25, 2019

guru-yang commented Oct 25, 2019

cgreene commented Oct 25, 2019

guru-yang commented Oct 25, 2019

jaclyn-taroni commented Oct 25, 2019

jharenza commented Oct 25, 2019 • edited Loading

guru-yang commented Oct 25, 2019

cgreene commented Oct 25, 2019

jharenza commented Nov 1, 2019

guru-yang commented Nov 2, 2019

jaclyn-taroni commented Nov 2, 2019

cgreene commented Nov 3, 2019

guru-yang commented Nov 4, 2019

jashapiro commented Nov 4, 2019

guru-yang commented Nov 8, 2019

jaclyn-taroni commented Nov 8, 2019

gonzolgarcia commented Nov 11, 2019

jaclyn-taroni commented Jan 2, 2020

jharenza commented Jul 14, 2019 •

edited

Loading

gonzolgarcia commented Jul 30, 2019 •

edited

Loading

jharenza commented Oct 25, 2019 •

edited

Loading