Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

#837 updated: RNA QC compare stranded vs polya matched sample_ids #930

Merged
merged 7 commits into from
Feb 10, 2021

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Feb 1, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

Is there a difference between stranded and polya samples and if so should we use 1 over other while annotating by TP53 status by sample_id?

What was your approach?

matched tp53_score_stranded and tp53_score_polya by sample_id ( only for sample_ids that have multiple RNA_library). For 7316-85 since there are multiple standed sample I took a mean of all samples.

What GitHub issue does your pull request address?

#837

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

This was not part of the original ticket but as part of #837 we are also compiling the tp53 status and @jharenza found multiples which lead to additional investigation and 1 part of QC requested in comment was to compare stranded and polya

image

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

plots in notebook

What is your summary of the results?

On average it seems the polya samples have lower tp53_scores compared to matched stranded samples.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 changed the title #837 updated RNA QC compare stranded vs polya matched sample_ids #837 updated: RNA QC compare stranded vs polya matched sample_ids Feb 1, 2021
@jharenza
Copy link
Collaborator

jharenza commented Feb 1, 2021

Hmm, there definitely looks like a bias there. However, for the four samples with scores below 0.5, maybe this is within the random variability of the classifier. For 7316-1455, both scores are high, but for 7316-161, the stranded sample would have predicted oncogenic TP53 and the polyA would not. Sorry to keep asking for things - can you create a table of TP53 alterations for each of these samples? (ie - do only 1455 and 161 have alterations)?

@gwaygenomics thoughts on this?

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Feb 2, 2021

Thanks for the review @jharenza!
I just copied the rows from compiled TP53 alterations here for 7316-1455 and 7316-161 and added the RNA_library info pre RNA bs id, let me know if you need additional info

sample_id RNA_library Kids_First_Biospecimen_ID_DNA Kids_First_Biospecimen_ID_RNA cancer_predispositions tp53_score SNV_indel_counts CNV_loss_counts HGVSp_Short copy_number hotspot activating tp53_altered
7316-161 stranded BS_W2QCHQ7E BS_SHJA4MR0 None documented 0.7719887631074178 2 0 p.R342P, p.P72Lfs*48 NA Yes   loss
7316-161 poly-A BS_W2QCHQ7E BS_X0XXN9BK None documented 0.4490920924301138 2 0 p.R342P, p.P72Lfs*48 NA Yes   loss
7316-1455 stranded NA BS_HE0WJRW6 Li-Fraumeni syndrome 0.9526901023341534 0 0 NA NA     loss
7316-1455 poly-A NA BS_HWGWYCY7 Li-Fraumeni syndrome 0.868128930617709 0 0 NA NA     loss

It seems for 7316-161 according to the condition of 2 SNV in DNA sample matched with both stranded and poly-A RNA the tp53_status is set to "loss"

@jharenza
Copy link
Collaborator

jharenza commented Feb 3, 2021

Hmm. I am wondering if we should utilize batch correction once completed via #919 to rerun the classifier for this dataset. I wonder if the scores would be closer together for polyA and stranded in that case. I don't really want to choose one over the other without some additional investigation, but for the most part, while there is a bias, they agree and for 7316-161, although the polyA score is just under 0.5, it is still considered a loss via evidence. This also gets back to perhaps a re-thresholding of the scores for what determines inactivation/oncogenic TP53 which was discussed early on. So, I think for now, perhaps we can average those scores instead of discarding one or the other?

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just added the comments to the notebook

analyses/tp53_nf1_score/02-qc-rna_expression_score.Rmd Outdated Show resolved Hide resolved
@jaclyn-taroni jaclyn-taroni self-requested a review February 6, 2021 21:35
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one minor comment about showing some information that's currently in a comment that could get out of date, but other than that this looks good to me!

Comment on lines 156 to 163
group_by(sample_id,) %>%
# because 7316-85 has multiple stranded I'm taking a mean here
# A tibble: 8 x 2
# Kids_First_Biospecimen_ID tp53_score_stranded
# <chr> <dbl>
# 1 BS_59ZJWJTF 0.299
# 2 BS_QYPHA40N 0.0665
# 3 BS_SB12W1XT 0.229
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, you are better off showing this output in a chunk rather than including it in a comment that could easily get out of date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review! That's a great point, I've now added a chunk to display the output

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge once CI passes 🚀

@jaclyn-taroni jaclyn-taroni merged commit ce57f8c into AlexsLemonade:master Feb 10, 2021
@kgaonkar6 kgaonkar6 deleted the rna-qc-tp53_score branch February 11, 2021 16:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants