Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Proposed Analysis: Molecularly subtype non-MB/ATRT embryonal tumors #251

Closed
jharenza opened this issue Nov 8, 2019 · 14 comments
Closed
Assignees
Labels
cnv Related to or requires CNV data fusion Related to or requires fusion data in progress Someone is working on this issue, but feel free to propose an alternative approach! molecular subtyping Related to molecular subtyping of tumors proposed analysis snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data

Comments

@jharenza
Copy link
Collaborator

jharenza commented Nov 8, 2019

Scientific goals

What are the scientific goals of the analysis?

  1. Subtype non-MB embryonal tumors (non-medulloblastoma/ATRT; eg: old PNET-classified tumors) into ETMRs and CNS embryonal NOS.
  2. Assess whether any other tumors may have been mis-diagnosed but fit into these categories (eg: ETMR characteristics in other tumor diagnoses would make these newly ETMR cases).
  3. Render the results in tabular form in a notebook.

Proposed methods

What methods do you plan to use to accomplish the scientific goals?

  1. Review of copy number, mutation, and expression data.
  2. Render results in tabular form in a notebook.

ETMR, C19MC-altered

  • Embryonal tumor with multilayered rosettes (ETMR), C19MC-altered
  • These tumors have focal amplification of the miRNA cluster on chr19 (denoted C19MC) and often have gene fusions involving TTYH1 and chr19 miRNA cluster genes
  • These tumors have high expression of LIN28A and serves as a biomarker for ETMRs

ETMR, NOS

  • Embryonal tumor with multilayered rosettes (ETMR)
  • These tumors have high expression of LIN28A and serves as a biomarker for ETMRs, but do not show focal amplification of C19MC.

CNS HGNET-MN1

  • CNS high-grade neuroepithelial tumor with MN1 alteration
  • Likely previously diagnosed as PNET.
  • Contain gene fusions involving 5' MN1. 3' fusion partners can include BEND2 and CXXC5.
  • Predominantly female patients.

CNS HGNET-BCOR

  • CNS high-grade neuroepithelial tumor with BCOR alteration
  • Tumors have internal tandem duplication of BCOR
  • Median age of diagnosis less than 10 years

CNS NB-FOXR2

  • Central nervous system (CNS) neuroblastoma with FOXR2 activation
  • Over-expression and/or gene fusions in FOXR2

CNS EFT-CIC

  • CNS Ewing sarcoma family tumor with CIC alteration
  • Alterations in CIC, commonly fused with _ NUTM1_

CNS Embryonal, NOS

  • CNS Embryonal tumor, not otherwise specified
  • Tumors previously called PNET that do not fit into other groups above.

Required input data

What input data will you use for this analysis?
Copy number, RNA expression data, histologies file.

Proposed timeline

What is the timeline for the analysis?
1 week

Relevant literature

If there is relevant scientific literature, put links to those items here.
Link to Embryonal Tumors of the Central Nervous System in Children: The Era of Targeted Therapeutics
Link to NCI PDQ.
Link to LIN28A, a sensitive immunohistochemical marker for Embryonal Tumor with Multilayered Rosettes (ETMR), is also positive in a subset of Atypical Teratoid/Rhabdoid Tumor (AT/RT).

@jaclyn-taroni jaclyn-taroni added cnv Related to or requires CNV data molecular subtyping Related to molecular subtyping of tumors snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data fusion Related to or requires fusion data labels Nov 8, 2019
@sjspielman
Copy link
Member

Will start this in the next couple of days after familiarizing myself with existing molecular subtyping analyses. Stay tuned, team.

@jaclyn-taroni jaclyn-taroni added the in progress Someone is working on this issue, but feel free to propose an alternative approach! label Dec 20, 2019
@sjspielman
Copy link
Member

Hi @jharenza , I wonder if you could clarify the scope of this analysis further - based on my reading of the PR, you would like to include any sample not previously identified as having MB or ATRT histology, but I wonder if I am reading this correctly or if I need some more context. For example, there is no PNET annotation within the histology file, so understanding which non-MB/ATRT samples to focus on for initial molecular subtyping (novel discovery and/or identification of incorrect annotations will come second) isn't immediately clear - unless this PR really does mean everything else, in which case I'm good to go. Thanks for any clarification about the scope here!

@jharenza
Copy link
Collaborator Author

jharenza commented Dec 21, 2019

Hi @sjspielman! First, thanks for working on this. Second, looks like I was inconsistent with the broad_histology so if you subset for just Embryonal tumor you will not see PNETs, but if you do something like emb <- subset(clin, grepl("Embryonal tumor", clin$broad_histology)), then you will see those PNETs in disease_type_old, and a few more.

as.data.frame(table(emb$disease_type_old)) (removing 0s):

Var1 Freq
Atypical Teratoid Rhabdoid Tumor (ATRT) 60
Ganglioneuroblastoma 6
Medulloblastoma 229
Other 7
Supratentorial or Spinal Cord PNET 34

The ganglioneuroblastoma, other, and PNET should be the samples subtyped. For the Other histologies, I had used our "database" to add a disease_type_new, but those were from pathology and we should re-confirm them. Hope this helps :)!

@sjspielman
Copy link
Member

Thanks for the clarification, @jharenza! I see this - so initial analysis will consist of 47 (6+7+34) known samples, yes?

@jharenza
Copy link
Collaborator Author

Correct!

@sjspielman
Copy link
Member

Getting back to this analysis after cleaning up the GSVA scores - I'm wondering if there's an issue with some of the data or not. Of the 47 samples to be subtyped, only 24 samples appear to have corresponding expression data, as in:

expression_file <- file.path(data_dir, "pbta-gene-expression-rsem-fpkm.stranded.rds") 
metadata_file <- file.path(data_dir, "pbta-histologies.tsv")
metadata   <- readr::read_tsv(metadata_file)
expression <- readr::read_rds(expression_file)

## Subset metadata to the samples that need subtyping
metadata %>% 
  filter(str_detect(broad_histology, "Embryonal tumor"),  ## Keep all embryonal 
         disease_type_old != "Medulloblastoma",           ## discard MB (this line) and ATRT (next line)
         disease_type_old != "Atypical Teratoid Rhabdoid Tumor (ATRT)") -> embryonal_unclassified

## Which of these samples have RNA expression data?
embryonal_unclassified$Kids_First_Biospecimen_ID %in% names(expression)
 [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE
[18] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
[35]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE

sum(embryonal_unclassified$Kids_First_Biospecimen_ID %in% names(expression))
[1] 24

Without having all expression data, it's not clear to me how one can subtype using expression data. @jharenza, do you have any particular insights here? I've confirmed my data is up to date with release v12.

@jaclyn-taroni
Copy link
Member

@sjspielman it doesn't look like you've filtered the metadata based on the experimental_strategy -- so there will be WGS or WXS samples in embryonal_unclassified which will be missing from the expression data.

You can find DNA-seq, RNA-seq pairs using the sample_id column. Each library has its own biospecimen ID. The sample_id is what captures the same event or sample. This might provide some needed context: #260 (comment) and here's what we ended up doing in the OncoPrint case: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/5431b72d37d4d55a09bb63b5d8f6abfce5b4309f/analyses/oncoprint-landscape/00-map-to-sample_id.R

@jaclyn-taroni
Copy link
Member

Side note: This issue has tripped up multiple people including me, so we should probably document this!

@sjspielman
Copy link
Member

Aha, of course thank you!! This brings the sample size down to 29 (not 47), with 4 of those having polyA expression and 25 stranded. I would not imagine it's ok to "merge" different RNAseq selection strategies so for now I'll work with 25 stranded selection ones only.

@jharenza
Copy link
Collaborator Author

jharenza commented Jan 7, 2020

@sjspielman - I had reached out to an ETMR domain expert, Derek Hanson, via email and am posting the exchange below for tracking purposes:

Hi Derek,

Nice to e-meet you. One of our major efforts within the OpenPBTA is molecularly subtyping all of the brain tumors. Specifically for non-MB/non-ATRT embryonal tumors, I created this ticket in GitHub (where all analyses are being tracked). Stephanie is the analyst currently working on the subtyping (pull request here) and I have read through some of her initial analyses and offered some guidance, but would be great to get another set of eyes on this to be sure I am directing her properly.

Essentially, we are starting with ETMRs since they have the easier molecular hallmarks (see this comment). We are approaching this by:

  1. First assessing whether any PBTA tumors contain TTYH1 fusions that lie within the C19MC region and if these same tumors have high LIN28A expression (biomarker). If yes, we are classifying as ETMRs. We also have WGS on some (not all) and I am spot-checking those for focal C19MC amplification, which is a give-away for ETMRs.
    a. A few questions came to mind – we see other non-canonical, if you will, fusions, like TTYH1-DPRX/RNU-698P (the 5’ fusion partner here is two genes because the breakpoint is intergenic between those two). Upon a UCSC search, you can see that these genes are adjacent the miRNA cluster, so I am considering this as a potential ETMR and we will validate this via LIN28A expression. Is this valid, in your opinion? Do you see/report those fusions as well?
    i. There are also samples with other TTYH1 fusions, but these are not within the C19MC region, so these would not be considered ETMRs off the bat.
    ii. I am making an assumption that this is the case, but as a double check, would you also say high LIN28A expression is required for ETMR classification?
  2. Next, we will subtype all non-MB/non-ATRTs into ETMRs or other embryonal tumors as described in the first ticket. Do you see anything in that ticket we should add/delete/modify?

If you have a GitHub username, you can comment directly on any of these tickets (if not, you can quickly create a username and comment), else I will just paste your email responses there. Once we have a finalized list of tumors, features, and classifications, would also love to get this back to you for confirmation!

and his response:

Hi Jo Lynne,

I took some time to review your email and the information in GitHub.

I think that the approach to look for TTYH1 fusions within the C19MC region along with LIN28A expression is valid, as well as looking for focal C19MC amplification.

I have not personally seen cases where the fusion has been adjacent to the miRNA cluster, but these tumors are very rare and I have only been able to review the molecular data on a handful of cases. This paper by Nada Jabado’s lab is probably the best reference when it comes to TTHY/C19MC fusions and may shed some more light on the situation - https://www.nature.com/articles/ng.2849. I think considering these cases with an adjacent fusion as potential ETMR and further validating with LIN28A expression is a reasonable plan.

LIN28A appears to be overexpressed in 100% of ETMRs and is highly specific for the tumor with only about 10% of ATRTs also expressing LIN28A - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3508282/. I have never seen an ETMR case reported in any series where LIN28A was not overexpressed.

You may also want to consider DNMT3B expression as another method for validation. According to the Jabado paper above, this is also highly specific for ETMR among brain tumors.

In summary, any tumor with TTHY1/C19MC fusion or C19MC amplification should be easy to classify. For the ~10% of ETMRs that do not have C19MC amplification, LIN28A and/or DNMT3B overexpression would be highly suggestive of ETMR. These overexpressions could also be confirmatory for those tumors with the adjacent fusions, which I agree are most likely ETMR.

I think that the plan outlined in the first GitHub ticket for classifying ETMR and other embryonal tumors is reasonable. As I mentioned, analyzing LIN28A expression by itself is likely sufficient for classifying ETMRs, but DNMT3B could also be used to confirm if there were any questions.

Please let me know if you have any questions or if there is anything else that I can do to be of help.

Thanks,
Derek

I think based on this, we should include DNMT3B as an overexpression marker of ETMRs as well as the LIN28A marker.

@jaclyn-taroni
Copy link
Member

I am planning on working on this over the next few days.

@jaclyn-taroni
Copy link
Member

molecular-subtyping-embryonal will need to be rerun with annotated focal consensus calls (#186) and new GISTIC calls (#453).

@jaclyn-taroni
Copy link
Member

Seems like Sturm et al. is good thing to link here -- the NCI PDQ section on CNS EFT-CIC cites it.

@jaclyn-taroni
Copy link
Member

Closed via #458 - the subtype labels will need to go into a data release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cnv Related to or requires CNV data fusion Related to or requires fusion data in progress Someone is working on this issue, but feel free to propose an alternative approach! molecular subtyping Related to molecular subtyping of tumors proposed analysis snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data
Projects
None yet
Development

No branches or pull requests

3 participants