Subtype chordoma #475

mkoptyra · 2020-01-24T15:33:46Z

Purpose/implementation Section

What scientific question is your analysis addressing?

To subtype the chordoma pediatric tumors by the expression and/or loss of SMARCB1/SNF5 gene.

What was your approach?

What GitHub issue does your pull request address?

#250

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

analyses/subtype-chordoma/01-Subtype-chordoma.Rmd

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Some style changes

jashapiro

Thank you for your contribution @mkoptyra! I have a few small suggestions for organization, but most of my suggestions are about adding some documentation to better describe the steps as they happen. As you are working in a notebook, it is very convenient to add descriptive sections before code blocks to write out the intent of the code that follows to make it easier to follow.

As a reminder, as @jaclyn-taroni has made some additions to this branch on github, you will want to "Pull" the changes to the branch on Github to your local machine before making further changes to avoid conflicts between edits made at different times.

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

jashapiro · 2020-01-28T14:35:41Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+```{r}
+chordoma_samples <- histologies_df %>%
+  filter(short_histology == "Chordoma") %>% 
+  pull(Kids_First_Biospecimen_ID)
+```


I'd suggest moving this code block closer to where you will actually use chrodoma_samples. Keep all the reading of files in this section, but move processing later.

jashapiro · 2020-01-28T14:37:30Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+
+## Prepare the data
+
+```{r}


It is probably worth adding some documentation here. What are you doing in this section? Something simple like:

"First we extract the chordoma samples that have a loss of SMARCB1 from focal_cn_df"

Thank you - updated

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

jashapiro · 2020-01-28T14:43:42Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+```{r}
+# remove large copy number data frame
+rm(focal_cn_df)
+```


This function seems out of place here. It probably isn't necessary, but if you want to clean up, it is probably better to do it nearer to where the last use of the df occurred, so around line 69

Thank you - moved

jashapiro · 2020-01-28T14:47:13Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+smarcb1_expression
+```
+
+```{r}


Add a comment here to make it clear that you are joining the copy number data with the expression data in this step

Thank you - added

jashapiro · 2020-01-28T14:48:35Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+                             Kids_First_Participant_ID)),
+             by = "sample_id")
+
+# combining the two biospecimen identifiers to a single column (all biospecimen IDs for a sampl separated by a comma)


Why are we doing this? Is there a reason not to leave these as two separate columns? Perhaps naming one as WGS and one as expression?

Possibly - I suppose this was a step to make sure the Kids_First_Participant_ID is clearly the way to identify/select by?
@jaclyn-taroni - could you comment on this

We started out handling the biospecimen IDs this was for the reason that @mkoptyra points out. But since giving it more thought and realizing the clinical file this will go into is focused on the Kids_First_Biospecimen_ID, we now do as @jashapiro suggestions in the HGG (#249) and non-MB, non-ATRT embryonal tumors. I can change this part @mkoptyra and then leave a detailed comment about what I did. Let me know when you’ve pushed your changes and are ready for me to make edits!

Now we rename the columns Kids_First_Biospecimen_ID_DNA and Kids_First_Biospecimen_ID_RNA --

chordoma_smarcb1_df <- smarcb1_expression %>% # any missing samples will get filled with NA when using a full join full_join(chordoma_copy, by = "sample_id") %>% rename(Kids_First_Biospecimen_ID_DNA = Kids_First_Biospecimen_ID, Kids_First_Biospecimen_ID_RNA = biospecimen_id)

That's what the rename function is doing.

I also now do:

chordoma_smarcb1_df <- chordoma_id_df %>% select(sample_id, Kids_First_Participant_ID) %>% distinct() %>% inner_join(chordoma_smarcb1_df, by = "sample_id")

instead of:

chordoma_smarcb1_df <- chordoma_smarcb1_df %>% inner_join(distinct(select(chordoma_id_df, sample_id, Kids_First_Participant_ID)), by = "sample_id")

It's doing the same thing (adding the Kids_First_Participant_ID using the chordoma_id_df data frame) but it's a little easier to read in my opinion.

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

jashapiro · 2020-01-28T15:34:55Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+
+Write the table to file.
+
+```{r}


Should this output file include a subtype "call"? Or is the deletion status sufficient?

What do you mean by subtype "call"?

I think @jashapiro is asking about whether we need a column that indicates whether or not a tumor is poorly-differentiated. I was hesitant to add this until the copy number was in better shape. We now have copy number consensus files (contain calls that multiple methods agree on) that we could use instead of the old version that uses the URL.

I updated the notebook to use the consensus file and also updated the documentation to reflect that change in c26db85

There are some instances where the expression levels between losses and the neutral calls are similar (current state of plot below):

We may want to hold off on adding calls until we do a bit more digging as part of #387 and #486

Making folder name more consistent with other analyses (f270af8) Updating directory name in CI (2fca75c) Using older version of CNVkit file (4ded6e8) Adding saving input (e7c23bd) Refreshing notebook (5058543) Adding more information to the README (4ec4aec) Adding entry to modules at a glance (db258e9 )

Thank you for the suggestions Co-Authored-By: jashapiro <jashapiro@gmail.com>

Thank you Co-Authored-By: jashapiro <jashapiro@gmail.com>

Co-Authored-By: jashapiro <jashapiro@gmail.com>

- adding more informative description to some chunks; moving order of the "rm(focal_cn_df)" chunk; editing the graph

mkoptyra · 2020-01-30T11:30:32Z

@jaclyn-taroni and @jashapiro - Thank you for all your work
Pulled all updated changes made by Jaclyn to my local notebook and updated some changes suggested by Josh
There are still 2 comments/questions in Josh's comments

jaclyn-taroni · 2020-01-30T11:41:37Z

@mkoptyra I would check to make sure you have committed and pushed all your changes via GitKraken. There are a couple instances where you have commented that you’ve made an update but the version I can see using the “Files changed” tab doesn’t yet have those updates. For the outstanding questions from @jashapiro, I can take a look, make changes, and then make sure to leave a detailed comment explaining those changes.

…hordoma

jaclyn-taroni · 2020-01-30T14:54:51Z

Thanks @mkoptyra, I'll take a look!

Some minor style changes, updated to use the consensus CN calls

jaclyn-taroni · 2020-01-30T16:14:17Z

@jashapiro ready for another look 👀 !

jashapiro · 2020-01-31T14:50:06Z

analyses/README.md

@@ -26,6 +26,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
 | [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `results/independent-specimens.wgs.primary.tsv` <br> `results/independent-specimens.wgs.primary-plus.tsv` <br> `results/independent-specimens.wgswxs.primary.tsv` <br> `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
 | [`interaction-plots`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/interaction-plots) | `independent-specimens.wgs.primary-plus.tsv` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | Creates interaction plots for mutation mutual exclusivity/co-occurrence [#13](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/13); may be updated to include other data types (e.g., fusions) | N/A
 | [`molecular-subtyping-ATRT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-ATRT) | `analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-snv-consensus-mutation-tmb-all.tsv`  <br>  `2019-01-28-consensus-cnv.zip` from [#453 (comment)](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/453#issuecomment-579340618) | Summarizing data into tabular format in order to molecularly subtype ATRT samples [#244](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/244); this analysis did not work | N/A
+| [`molecular-subtyping-chordoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-chordoma) | `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | *In progress*; identifying poorly-differentiated chordoma samples per [#250](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/250) | N/A


Do we now have an output file?

We do have an output file, but not one I expect to be consumed by other analyses. I expect molecular subtype labels, when we have them, will go into the pbta-histologies.tsv file and be consumed that way.

jashapiro

Looks good to me!

jashapiro · 2020-01-31T14:57:08Z

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

+
+Write the table to file.
+
+```{r}


jaclyn-taroni · 2020-01-31T20:01:59Z

Thanks for your work on this @mkoptyra ! I'm going to merge now.

mkoptyra added 6 commits November 21, 2019 09:59

Created the README file

a06a791

Copy number- selection of poorly differentiated chordoma (SMARCB1)

83d08f7

Merge remote-tracking branch 'upstream/master' into subtype-chordoma

280e596

Updates to the notebook, first graphing attempt

b0124d9

updating directory to newer focal_cn file

bf1b64b

Add notebook to continues integration

c96a9cd

jashapiro reviewed Jan 24, 2020

View reviewed changes

analyses/subtype-chordoma/01-Subtype-chordoma.Rmd Outdated Show resolved Hide resolved

mkoptyra and others added 8 commits January 24, 2020 10:56

fixed typo

007a049

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Merge remote-tracking branch 'upstream/master' into subtype-chordoma

f0c266d

Make folder name more consistent with other analyses

f270af8

Update directory name in CI

2fca75c

Use older version of CNVkit file

4ded6e8

Some style changes

Add saving output

e7c23bd

Refresh notebook

5058543

Add more information to the README

4ec4aec

jaclyn-taroni added the molecular subtyping Related to molecular subtyping of tumors label Jan 25, 2020

Add entry to modules at a glance

db258e9

jaclyn-taroni requested a review from jashapiro January 25, 2020 21:07

jashapiro reviewed Jan 28, 2020

View reviewed changes

mkoptyra and others added 8 commits January 30, 2020 04:32

Merge remote-tracking branch 'upstream/master' into subtype-chordoma

f2aff91

Plot updated

d3f44fc

Update analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

b5b94fc

Thank you for the suggestions Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

550d529

Thank you Co-Authored-By: jashapiro <jashapiro@gmail.com>

Update analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

49086c7

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Minor changes suggested by Josh Shapiro

1f9125e

- adding more informative description to some chunks; moving order of the "rm(focal_cn_df)" chunk; editing the graph

New graph updated

41ce3d6

Merge remote-tracking branch 'origin/subtype-chordoma' into subtype-c…

4b636aa

…hordoma

jaclyn-taroni added 3 commits January 30, 2020 10:00

Fix conflicts in analyses/README

635ad87

Address @jashapiro comments

4ea46f5

Some minor style changes, updated to use the consensus CN calls

Update documentation to indicate that we are using the consensus files

c26db85

jashapiro reviewed Jan 31, 2020

View reviewed changes

jashapiro approved these changes Jan 31, 2020

View reviewed changes

analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd

Write the table to file.

```{r}

Copy link

Member

jashapiro Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

jaclyn-taroni merged commit eadb78d into AlexsLemonade:master Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtype chordoma #475

Subtype chordoma #475

mkoptyra commented Jan 24, 2020 •

edited by jaclyn-taroni

Loading

jashapiro left a comment

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jaclyn-taroni Jan 30, 2020

jaclyn-taroni Jan 30, 2020

jashapiro Jan 28, 2020

mkoptyra Jan 30, 2020

jaclyn-taroni Jan 30, 2020

jaclyn-taroni Jan 30, 2020

jaclyn-taroni Jan 30, 2020

jashapiro Jan 31, 2020

mkoptyra commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

jashapiro Jan 31, 2020

jaclyn-taroni Jan 31, 2020

jashapiro left a comment

jashapiro Jan 31, 2020

jaclyn-taroni commented Jan 31, 2020

Subtype chordoma #475

Subtype chordoma #475

Conversation

mkoptyra commented Jan 24, 2020 • edited by jaclyn-taroni Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkoptyra commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

jaclyn-taroni commented Jan 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni commented Jan 31, 2020

mkoptyra commented Jan 24, 2020 •

edited by jaclyn-taroni

Loading