This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Subtype chordoma #475
Subtype chordoma #475
Changes from 15 commits
a06a791
83d08f7
280e596
b0124d9
bf1b64b
c96a9cd
007a049
f0c266d
f270af8
2fca75c
4ded6e8
e7c23bd
5058543
4ec4aec
db258e9
f2aff91
2a862c9
d3f44fc
b5b94fc
550d529
49086c7
1f9125e
41ce3d6
4b636aa
635ad87
4ea46f5
c26db85
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving this code block closer to where you will actually use
chrodoma_samples
. Keep all the reading of files in this section, but move processing later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth adding some documentation here. What are you doing in this section? Something simple like:
"First we extract the chordoma samples that have a loss of SMARCB1 from
focal_cn_df
"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these losses? Or could a gain creep in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like great catch for me - don't know actually how to explore that.
I also sense based on the ploidy column that it may be not that straight forward for the BS_5B6XZ7YP sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
became
status == "loss"
specifies that it must be a loss and that gains or amplifications can't leak through. I can confirm that the only kind of CNA in SMARCB1 in chordoma samples are losses.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems out of place here. It probably isn't necessary, but if you want to clean up, it is probably better to do it nearer to where the last use of the df occurred, so around line 69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here to make it clear that you are joining the copy number data with the expression data in this step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this? Is there a reason not to leave these as two separate columns? Perhaps naming one as WGS and one as expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - I suppose this was a step to make sure the Kids_First_Participant_ID is clearly the way to identify/select by?
@jaclyn-taroni - could you comment on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started out handling the biospecimen IDs this was for the reason that @mkoptyra points out. But since giving it more thought and realizing the clinical file this will go into is focused on the Kids_First_Biospecimen_ID, we now do as @jashapiro suggestions in the HGG (#249) and non-MB, non-ATRT embryonal tumors. I can change this part @mkoptyra and then leave a detailed comment about what I did. Let me know when you’ve pushed your changes and are ready for me to make edits!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we rename the columns
Kids_First_Biospecimen_ID_DNA
andKids_First_Biospecimen_ID_RNA
--That's what the
rename
function is doing.I also now do:
instead of:
It's doing the same thing (adding the
Kids_First_Participant_ID
using thechordoma_id_df
data frame) but it's a little easier to read in my opinion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this output file include a subtype "call"? Or is the deletion status sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by subtype "call"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @jashapiro is asking about whether we need a column that indicates whether or not a tumor is poorly-differentiated. I was hesitant to add this until the copy number was in better shape. We now have copy number consensus files (contain calls that multiple methods agree on) that we could use instead of the old version that uses the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the notebook to use the consensus file and also updated the documentation to reflect that change in c26db85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some instances where the expression levels between losses and the neutral calls are similar (current state of plot below):
We may want to hold off on adding calls until we do a bit more digging as part of #387 and #486
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Large diffs are not rendered by default.