-
Notifications
You must be signed in to change notification settings - Fork 67
Subtype chordoma #475
Subtype chordoma #475
Conversation
Co-Authored-By: jashapiro <jashapiro@gmail.com>
Some style changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution @mkoptyra! I have a few small suggestions for organization, but most of my suggestions are about adding some documentation to better describe the steps as they happen. As you are working in a notebook, it is very convenient to add descriptive sections before code blocks to write out the intent of the code that follows to make it easier to follow.
As a reminder, as @jaclyn-taroni has made some additions to this branch on github, you will want to "Pull" the changes to the branch on Github to your local machine before making further changes to avoid conflicts between edits made at different times.
```{r} | ||
chordoma_samples <- histologies_df %>% | ||
filter(short_histology == "Chordoma") %>% | ||
pull(Kids_First_Biospecimen_ID) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving this code block closer to where you will actually use chrodoma_samples
. Keep all the reading of files in this section, but move processing later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
|
||
## Prepare the data | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth adding some documentation here. What are you doing in this section? Something simple like:
"First we extract the chordoma samples that have a loss of SMARCB1 from focal_cn_df
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - updated
```{r} | ||
# remove large copy number data frame | ||
rm(focal_cn_df) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems out of place here. It probably isn't necessary, but if you want to clean up, it is probably better to do it nearer to where the last use of the df occurred, so around line 69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - moved
smarcb1_expression | ||
``` | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here to make it clear that you are joining the copy number data with the expression data in this step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - added
Kids_First_Participant_ID)), | ||
by = "sample_id") | ||
|
||
# combining the two biospecimen identifiers to a single column (all biospecimen IDs for a sampl separated by a comma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this? Is there a reason not to leave these as two separate columns? Perhaps naming one as WGS and one as expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - I suppose this was a step to make sure the Kids_First_Participant_ID is clearly the way to identify/select by?
@jaclyn-taroni - could you comment on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started out handling the biospecimen IDs this was for the reason that @mkoptyra points out. But since giving it more thought and realizing the clinical file this will go into is focused on the Kids_First_Biospecimen_ID, we now do as @jashapiro suggestions in the HGG (#249) and non-MB, non-ATRT embryonal tumors. I can change this part @mkoptyra and then leave a detailed comment about what I did. Let me know when you’ve pushed your changes and are ready for me to make edits!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we rename the columns Kids_First_Biospecimen_ID_DNA
and Kids_First_Biospecimen_ID_RNA
--
chordoma_smarcb1_df <- smarcb1_expression %>%
# any missing samples will get filled with NA when using a full join
full_join(chordoma_copy, by = "sample_id") %>%
rename(Kids_First_Biospecimen_ID_DNA = Kids_First_Biospecimen_ID,
Kids_First_Biospecimen_ID_RNA = biospecimen_id)
That's what the rename
function is doing.
I also now do:
chordoma_smarcb1_df <- chordoma_id_df %>%
select(sample_id, Kids_First_Participant_ID) %>%
distinct() %>%
inner_join(chordoma_smarcb1_df,
by = "sample_id")
instead of:
chordoma_smarcb1_df <- chordoma_smarcb1_df %>%
inner_join(distinct(select(chordoma_id_df,
sample_id,
Kids_First_Participant_ID)),
by = "sample_id")
It's doing the same thing (adding the Kids_First_Participant_ID
using the chordoma_id_df
data frame) but it's a little easier to read in my opinion.
|
||
Write the table to file. | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this output file include a subtype "call"? Or is the deletion status sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by subtype "call"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jashapiro is asking about whether we need a column that indicates whether or not a tumor is poorly-differentiated. I was hesitant to add this until the copy number was in better shape. We now have copy number consensus files (contain calls that multiple methods agree on) that we could use instead of the old version that uses the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the notebook to use the consensus file and also updated the documentation to reflect that change in c26db85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Making folder name more consistent with other analyses (f270af8) Updating directory name in CI (2fca75c) Using older version of CNVkit file (4ded6e8) Adding saving input (e7c23bd) Refreshing notebook (5058543) Adding more information to the README (4ec4aec) Adding entry to modules at a glance (db258e9 )
Thank you for the suggestions Co-Authored-By: jashapiro <jashapiro@gmail.com>
Thank you Co-Authored-By: jashapiro <jashapiro@gmail.com>
Co-Authored-By: jashapiro <jashapiro@gmail.com>
- adding more informative description to some chunks; moving order of the "rm(focal_cn_df)" chunk; editing the graph
@jaclyn-taroni and @jashapiro - Thank you for all your work |
@mkoptyra I would check to make sure you have committed and pushed all your changes via GitKraken. There are a couple instances where you have commented that you’ve made an update but the version I can see using the “Files changed” tab doesn’t yet have those updates. For the outstanding questions from @jashapiro, I can take a look, make changes, and then make sure to leave a detailed comment explaining those changes. |
Thanks @mkoptyra, I'll take a look! |
Some minor style changes, updated to use the consensus CN calls
@jashapiro ready for another look 👀 ! |
@@ -26,6 +26,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist | |||
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `results/independent-specimens.wgs.primary.tsv` <br> `results/independent-specimens.wgs.primary-plus.tsv` <br> `results/independent-specimens.wgswxs.primary.tsv` <br> `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download) | |||
| [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A | |||
| [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-snv-consensus-mutation-tmb-all.tsv` <br> `2019-01-28-consensus-cnv.zip` from [#453 (comment)](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/453#issuecomment-579340618) | Summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244); this analysis did not work | N/A | |||
| [`molecular-subtyping-chordoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-chordoma) | `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | *In progress*; identifying poorly-differentiated chordoma samples per [#250](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/250) | N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we now have an output file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have an output file, but not one I expect to be consumed by other analyses. I expect molecular subtype labels, when we have them, will go into the pbta-histologies.tsv
file and be consumed that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
|
||
Write the table to file. | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Thanks for your work on this @mkoptyra ! I'm going to merge now. |
Purpose/implementation Section
What scientific question is your analysis addressing?
To subtype the chordoma pediatric tumors by the expression and/or loss of SMARCB1/SNF5 gene.
What was your approach?
What GitHub issue does your pull request address?
#250
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Results
What types of results are included (e.g., table, figure)?
What is your summary of the results?
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.