Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: add CNV alterations to evaluate tp53, nf1 classifier #653

Closed
kgaonkar6 opened this issue Mar 26, 2020 · 9 comments
Closed
Assignees

Comments

@kgaonkar6
Copy link
Collaborator

kgaonkar6 commented Mar 26, 2020

What analysis module should be updated and why?

Tp53 and nf1 classifier by Greg Way was used to get TP53 and NF1 inactivation scores
#165
As you might know we only used SNVs to get tp53/nf1 altered status for evaluating the classifier results, should we add CNV calls as well to identify tp53/nf1 deletion inactivation status of samples as per #165 (comment) ?

What changes need to be made? Please provide enough detail for another participant to

extend https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/tp53_nf1_score/00-tp53-nf1-alterations.R to include CNV deletion (amplification as well?) calls to analyses/tp53_nf1_score/results/TP53_NF1_snv_alteration.tsv which will then be used to evaluate the classifier results https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/tp53_nf1_score/02-evaluate-classifier.py.

What input data should be used? Which data were used in the version being updated?

data/consensus_seg_annotated_cn_autosomes.tsv.gz maybe ?

When do you expect the revised analysis will be completed?

1 week

Who will complete the updated analysis?

@kgaonkar6

@kgaonkar6 kgaonkar6 self-assigned this Mar 26, 2020
@kgaonkar6 kgaonkar6 changed the title Updated analysis: tp53, nf1 classifier Updated analysis: add CNV alterations to evaluate tp53, nf1 classifier Apr 2, 2020
@kgaonkar6
Copy link
Collaborator Author

Hi @jaclyn-taroni ! To gather the copy number alterations I’m using consensus_seg_annotated_cn_autosomes.tsv.gz since it looks like is being used for certain subtyping analysis and is also annotated with gene_symbol. When I look at the status column there's loss, gain and amp
cnvConsesus$status %>% unique()
[1] “loss” “gain” “amplification”
I used the all "loss" data in the evaluation step and it seems it's not a good match to evaluate looking at the ROC curve for stranded data here:
stranded_TP53
stranded_NF1

I went back to the discussion #385 (comment) which mentions using only deep deletions. So should I be using a different file or should I probably apply a logic using the copy_number and ploidy column to get deep deletion alterations for NF1 and TP53

@jaclyn-taroni
Copy link
Member

@kgaonkar6 I can think about this a bit more but my knee-jerk reaction is to look at instances where copy number = 0

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Apr 6, 2020

@kgaonkar6 I can think about this a bit more but my knee-jerk reaction is to look at instances where copy number = 0

Thanks! There was only 1 entry for copy number==0
cnvConsesus[which(cnvConsesus$copy_number == 0 & cnvConsesus$gene_symbol %in% c("TP53","NF1")),]
1: BS_ZV21J6YW loss 0 2 ENSG00000141510 TP53 17p13.1

@jaclyn-taroni
Copy link
Member

@kgaonkar6 my understanding of what's happening to get the gene symbols in that file is that we're just looking at any overlap between a segment and exons from a particular gene (you can take a look at the function that does the work here; it uses mergeByOverlaps). So I might expect that there are a lot of results that get "let in" that don't have any consequences that we'd see at the transcriptomic level.

There are some results in this document that support that idea. (That document is looking at the impact of CNA in a gene on its expression level, which I suspect depends on things like how highly a gene is expressed in general, etc.)

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Apr 7, 2020

Ok got it. Should I look into maybe bs_ids which overlap between cnvConcensus and manta then?

For example: I got 8 BS_Ids when I checked manta for TP53 deletion
manta_bs_ids_tp53<-manta %>% filter(grepl("[/]TP53[$|/]",manta$Gene.name) & SV.type=="DEL" & FILTER=="PASS") %>% select(Kids.First.Biospecimen.ID.Tumor) %>% unique()

overlapping with cnvConcensus I get:
cnvConsesus %>% filter(biospecimen_id %in% manta_bs_ids_tp53$Kids.First.Biospecimen.ID.Tumor & gene_symbol=="TP53")

biospecimen_id status copy_number ploidy ensembl gene_symbol cytoband
BS_3NX3RBCX loss 1 2 ENSG00000141510 TP53 17p13.1
BS_79SYEHY3 loss 2 4 ENSG00000141510 TP53 17p13.1
BS_96S0VQBN loss 2 4 ENSG00000141510 TP53 17p13.1
BS_EJV0N3BX loss 1 2 ENSG00000141510 TP53 17p13.1
BS_H8NWA41N loss 2 3 ENSG00000141510 TP53 17p13.1

I should probably check for the expression of TP53 in these bs_ids as well like in the document in your link.

@jaclyn-taroni
Copy link
Member

The consensus SEG file (pbta-cnv-consensus.seg.gz) that is used to generate consensus_seg_annotated_cn_autosomes.tsv.gz takes Manta calls into account but doesn't take ploidy into account in the same way as focal-cn-file-preparation (where consensus_seg_annotated_cn_autosomes.tsv.gz is generated). A bit more background:

How are you handling the fact that copy neutral segments are not in consensus_seg_annotated_cn_autosomes.tsv.gz? That has been a point of friction for other analyses.

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Apr 7, 2020

Thanks for the background, I'll read up.. that's helpful!

How are you handling the fact that copy neutral segments are not in consensus_seg_annotated_cn_autosomes.tsv.gz? That has been a point of friction for other analyses.

Good point to discuss , I'll put in a PR with what I have for now which might be helpful to look into this more clearly :)
Right now if I don't find a TP53 loss I have added it to the "No_TP53_NF1_alt" condition like in (updated) https://github.com/kgaonkar6/OpenPBTA-analysis/blob/87360e83ec6d4030e6fa9015d32a4fa1589a48d1/analyses/tp53_nf1_score/00-tp53-nf1-alterations.R#L126

@jaclyn-taroni
Copy link
Member

Filing a PR so we can take a look sounds good, thanks!

@kgaonkar6
Copy link
Collaborator Author

subsumed by #837 where we discussed CNV filter and TP53 domain region of overlap

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants