-
Notifications
You must be signed in to change notification settings - Fork 67
Additional tables for sample distribution: breakdown by tumor descriptor #213
Additional tables for sample distribution: breakdown by tumor descriptor #213
Conversation
Using v7 data
I like this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and a nice set of analyses to have. A couple little questions, but more about upstream data..
|
||
Setting aside the `Panel` sample for the moment. | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a description of what you are doing here? It is seems from the code that you are collapsing multiple samples to create a single descriptor when there is more than one kind of tumor, but it seems worth noting in text.
|
||
### Paired genomic, transcriptomic assays | ||
|
||
How many participants have paired genomic and transcriptomic samples for the same `tumor_descriptor` values? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here seems to require all descriptors to be paired for each participant? So if a participant had genomic and transcriptomic samples for Initial CNS tumor
, but not both for Progressive
, say, then it wouldn't be counted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, another few lines down, I see that this is the case, and you have addressed it. But I do think it leaves the tables up here somewhat hard to interpret. At first glance, it appears that there may be many missing WGS samples, but in fact a portion of those are where there is actually missing RNAseq data making the descriptors not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take out looking at paired_df
in line 134. I think it's more confusing than helpful.
regulartable() %>% | ||
fontsize(size = 12, part="all") | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have questions about some of these descriptors, and the meaning of delimiters. For example, we have: Ganglioglioma;Low-grade glioma/astrocytoma (WHO grade I/II)
, as well as Ganglioglioma
on its own, and Low-grade glioma;astrocytoma (WHO grade I/II)
on its own. Is the first some kind of combination of the latter two? Is there a semantic difference between the use of ;
and /
in these cases? This PR is probably not the right place for this discussion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, these are all derived from the way they were added to the database - seems the \
should not be between LGG/astrocytoma, but that should be a ;
. Ganglioglioma and LGG/astrocytoma are different and only shown together since they were both on the pathology report. If possible, through genomic analyses, we could further dial into the diagnoses, we can possibly replace with one or the other.
@@ -1,21 +1,21 @@ | |||
primary_site number_of_types max_type second_max_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep definitely broken #214
The table that is broken is now tracked in #220, so I will go ahead and merge this. |
Purpose/implementation
To my knowledge, we don't have a tables that:
tumor_descriptor
(e.g., primary vs. other) within assay type (WGS/WXS vs. RNA-seq)tumor_descriptor
by histology (useful for things like planning Planned Analysis: Primary vs Relapse #16)Here I'm adding those tables in a single notebook and using the
flextable
package to display things nicely.You can see the rendered notebook here.
Issue
Closes #162
Directions for reviewers
flextable
display?Results
One thing that struck me -- within a histology (as tracked in
disease_type_new
), there are not that many instances of: has primary and progressive data, has primary and recurrence data. This is good to know!Docker and continuous integration
Check all those that apply or remove this section if it is not applicable.