Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PR 3 of n - Molecular subtyping embryonal tumors (Wrangle 'final' table) #458

Merged
merged 19 commits into from
Feb 2, 2020

Conversation

jaclyn-taroni
Copy link
Member

@jaclyn-taroni jaclyn-taroni commented Jan 20, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Molecular subtyping of non-MB, non-ATRT embryonal tumors

What was your approach?

The notebook I'm adding here wrangles the relevant fusion, expression, and copy number data listed on the on #251. It does not currently include information about the presence or absence BCOR tandem duplications, as I expect that would require some cleaning of the structural variant data that hasn't been done yet.

You can view the rendered version of the notebook here: https://jaclyn-taroni.github.io/openpbta-notebook-concept/03-table-prep.nb.html

You can view the table with all the relevant data here: https://github.com/jaclyn-taroni/OpenPBTA-analysis/blob/d024b044255e2fa1073b9ffff393ebf9bef76cfc/analyses/molecular-subtyping-embryonal/results/embryonal_tumor_subtyping_relevant_data.tsv

What GitHub issue does your pull request address?

#251

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

wrangle_fusions function, chr19 amplification

Is there anything that you want to discuss further?

What else, if anything, needs to be added to complete calls as part of #251?

Results

What is your summary of the results?

Please see: https://jaclyn-taroni.github.io/openpbta-notebook-concept/03-table-prep.nb.html#subtyping

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jaclyn-taroni jaclyn-taroni added the molecular subtyping Related to molecular subtyping of tumors label Jan 20, 2020
@jaclyn-taroni
Copy link
Member Author

Tagging @jharenza to weigh in.

@jaclyn-taroni jaclyn-taroni marked this pull request as ready for review January 25, 2020 19:21
We use the consensus SEG file, rather than the annotated version from `focal-cn-file-preparation`
Use new fusion-summary file committed to repo
Copy number, SV files
@jaclyn-taroni
Copy link
Member Author

In the last few commits, I tried to look into the BCOR internal tandem duplications and I'm also using the new fusion-summary files from #478 that are committed to the repository. The rendered notebook has been updated in jaclyn-taroni/openpbta-notebook-concept and here's the up-to-date table: https://github.com/jaclyn-taroni/OpenPBTA-analysis/blob/0e110e7668530b9b7611ab8a3a9eff33f98e03b6/analyses/molecular-subtyping-embryonal/results/embryonal_tumor_subtyping_relevant_data.tsv

@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2020

Hi @jaclyn-taroni - I think this looks really great! A few comments and notes to myself following along this notebook.

"First subset to embryonal tumors, excluding any derived cell lines." - there are no cell lines in this dataset, but future code maybe should not be as restrictive.

Re: C19MC coordinates, I found a few publications with older genome coordinates and lifted over, but there is not a great consistency in these.

  • In this paper, "Cluster on chromosome 19 located at positions 58,861,745–58,961,404 (HG17) and comprising 54 microRNA genes grouped into four families on the basis of hairpin sequence similarity."
version chr start end
hg17 chr19 58861745 58961404
hg19 chr19 54169933 54269592
hg38 chr19 53666679 53766338
  • In this paper, "Schematic representation of the ∼100-kb long C19MC (HG18: 58 860 000–58 962 300) mapping at human chromosome 19q13.41"
version chr start end
hg18 chr19 58860000 58962300
hg38 chr19 53664934 53767234

From R biomaRt:

library(biomaRt)
listMarts()
               biomart               version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 99
2   ENSEMBL_MART_MOUSE      Mouse strains 99
3     ENSEMBL_MART_SNP  Ensembl Variation 99
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 99
##make sure hg38 is being used - yes
searchDatasets(mart = ensembl, pattern = "hsapiens")
                 dataset              description    version
78 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13
##select human dataset
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
##get coordinates of all chr bands
bands <- getBM(attributes = c('chromosome_name','start_position', 'end_position', 'band'),
      mart = human)
##subset only 19q13.41 for C19MC
q13.41 <- subset(bands, chromosome_name == "19" & band == "q13.41")
#what are the start and end coordinates for these?
start <- q13.41[which.min(q13.41$start_position),"start_position"]
end <- q13.41[which.max(q13.41$end_position),"end_position"]
paste("chr19", start, end, sep = "\t")
chr19 50906352 53051680

"Granted, we don’t currently have enough information to look specifically at CNS HGNET-BCOR and therefore perhaps can not classify any tumor as CNS Embryonal, NOS."" - I would argue that if we do not have enough information, the remaining samples be classified as CNS Embryonal, NOS.

"For some samples that have a TTYH1 fusion, we do not have DNA data to check for C19MC amplification." - According to this paper, "Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs.", all of the cases with C19MC had TTYH1 fusions, which were causal in the miRNA cluster amplification, so I think it is safe to label all of these as C19MC-altered.

Manual check for BS_69VS8PS1, BS_TE8QFF7T, and BS_K07KNTFY DNA:
BS_69VS8PS1 - yes
BS_69VS8PS1-c19mc.pdf
BS_TE8QFF7T - yes
BS_TE8QFF7T-c19mc.pdf
BS_K07KNTFY - yes
BS_K07KNTFY-c19mc.pdf

Great catch on the MN1 fusions! It looks like BS_KSKZ9J7J should definitely be classified as CNS HGNET-MN1 since it contains a MN1--CXXC5 fusion and according to this publication, there were some previously diagnosed as ependymomas. For BS_ZVZDDW2G, MN1--PATZ1 could be a novel MN1 fusion for the subtype or may not be this subtype, so perhaps I can write in the "notes" section of the clinical file something like 'possible CNS HGNET-MN1 subtype with MN1--PATZ1 fusion, or something similar?

For CNS NB-FOXR2, I think you have the right set.

For CNS HGNET-BCOR, we may have to bypass those if no clear evidence and do the broader classification of CNS Embryonal, NOS.

From here, do you want to add a file with subtypes?

@jaclyn-taroni
Copy link
Member Author

"First subset to embryonal tumors, excluding any derived cell lines." - there are no cell lines in this dataset, but future code maybe should not be as restrictive.

@jharenza - including cell lines makes the mapping between RNA-seq and DNA-seq samples a little more difficult, you would have to also join on the sample composition. Excluding them is consistent with how we've approached other subtyping efforts because of this difficulty.

@jaclyn-taroni
Copy link
Member Author

Okay @jharenza - I'm going to call every sample with a TTYH1 fusion ETMR, C19MC-altered. What about 7316-2658? LIN28A z-score is 3.89, no TTYH1 fusion, and no copy number data. Should this sample be classified as ETMR, NOS or CNS Embryonal NOS?

@jaclyn-taroni
Copy link
Member Author

I went with ETMR, NOS for now. Here is the table with subtype labels: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/101b1c4470abfbd0e3d14281b3705d1fcb2bfe00/analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv

I broke out the C19MC cleaning steps into a notebook with visualizations and so I could capture the table with coordinates from #458 (comment). That notebook is available here: https://jaclyn-taroni.github.io/openpbta-notebook-concept/03-clean-c19mc-data.nb.html

New version of the notebook with the subtyping calls is here: https://jaclyn-taroni.github.io/openpbta-notebook-concept/04-table-prep.nb.html

@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2020

Okay @jharenza - I'm going to call every sample with a TTYH1 fusion ETMR, C19MC-altered. What about 7316-2658? LIN28A z-score is 3.89, no TTYH1 fusion, and no copy number data. Should this sample be classified as ETMR, NOS or CNS Embryonal NOS?

This should be ETMR, NOS, you are right!

@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2020

Visualizations look great and so does the subtyping table!

@jharenza jharenza self-requested a review February 1, 2020 22:33
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2020

Oh wait @jaclyn-taroni, I just realized I think we should add BS_KSKZ9J7J, the sample now CNS HGNET-MN1 into the final table as well. Looks like you were adding that dataframe, but I don't see it in the final table (still 28 patients instead of 29).

@jaclyn-taroni
Copy link
Member Author

Fixed in e8d4148!

@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2020

Awesome, you're so quick! All good now!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
molecular subtyping Related to molecular subtyping of tumors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants