Additional tables for sample distribution: breakdown by tumor descriptor #213

jaclyn-taroni · 2019-11-03T16:30:43Z

Purpose/implementation

To my knowledge, we don't have a tables that:

Just list the number of assays (with the paired normal removed)
Look at the breakdown by tumor_descriptor (e.g., primary vs. other) within assay type (WGS/WXS vs. RNA-seq)
Look at the breakdown by tumor_descriptor by histology (useful for things like planning Planned Analysis: Primary vs Relapse #16)

Here I'm adding those tables in a single notebook and using the flextable package to display things nicely.

You can see the rendered notebook here.

Issue

Closes #162

Directions for reviewers

Please check for correctness, i.e., the code accomplishes what the prose describes.
What do you think of the flextable display?
Are there any additional tables that you think it'd be useful to have in this notebook?

Results

One thing that struck me -- within a histology (as tracked in disease_type_new), there are not that many instances of: has primary and progressive data, has primary and recurrence data. This is good to know!

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Using v7 data

And rerun

jharenza · 2019-11-03T18:57:29Z

I like this!

jashapiro

Looks good, and a nice set of analyses to have. A couple little questions, but more about upstream data..

jashapiro · 2019-11-03T17:55:41Z

analyses/sample-distribution-analysis/03-tumor-descriptor-and-assay-count.Rmd

+
+Setting aside the `Panel` sample for the moment.
+
+```{r}


Add a description of what you are doing here? It is seems from the code that you are collapsing multiple samples to create a single descriptor when there is more than one kind of tumor, but it seems worth noting in text.

jashapiro · 2019-11-03T17:58:09Z

analyses/sample-distribution-analysis/03-tumor-descriptor-and-assay-count.Rmd

+
+### Paired genomic, transcriptomic assays
+
+How many participants have paired genomic and transcriptomic samples for the same `tumor_descriptor` values?


The code here seems to require all descriptors to be paired for each participant? So if a participant had genomic and transcriptomic samples for Initial CNS tumor, but not both for Progressive, say, then it wouldn't be counted here?

Okay, another few lines down, I see that this is the case, and you have addressed it. But I do think it leaves the tables up here somewhat hard to interpret. At first glance, it appears that there may be many missing WGS samples, but in fact a portion of those are where there is actually missing RNAseq data making the descriptors not match.

I can take out looking at paired_df in line 134. I think it's more confusing than helpful.

jashapiro · 2019-11-03T19:03:39Z

analyses/sample-distribution-analysis/03-tumor-descriptor-and-assay-count.Rmd

+  regulartable() %>%
+  fontsize(size = 12, part="all")
+```
+


I have questions about some of these descriptors, and the meaning of delimiters. For example, we have: Ganglioglioma;Low-grade glioma/astrocytoma (WHO grade I/II), as well as Ganglioglioma on its own, and Low-grade glioma;astrocytoma (WHO grade I/II) on its own. Is the first some kind of combination of the latter two? Is there a semantic difference between the use of ; and / in these cases? This PR is probably not the right place for this discussion...

Hmm, these are all derived from the way they were added to the database - seems the \ should not be between LGG/astrocytoma, but that should be a ;. Ganglioglioma and LGG/astrocytoma are different and only shown together since they were both on the pathology report. If possible, through genomic analyses, we could further dial into the diagnoses, we can possibly replace with one or the other.

jashapiro · 2019-11-03T19:06:17Z

analyses/sample-distribution-analysis/results/primary_sites_counts.tsv

@@ -1,21 +1,21 @@
 primary_site	number_of_types	max_type	second_max_type


This seems broken?

Yep definitely broken #214

jaclyn-taroni · 2019-11-04T13:30:35Z

The table that is broken is now tracked in #220, so I will go ahead and merge this.

jaclyn-taroni added 6 commits November 3, 2019 10:57

Add flextable to Docker

521b61d

Add notebook looking at tumor_descriptor breakdown

5a3d83e

Add notebook to shell script; rerun

228b704

Using v7 data

Add table examining more than one timepoint per histology

c1124ca

And rerun

Update module-specific README

1f26222

Add TODO re: primary_site column

cbef8e3

jashapiro approved these changes Nov 3, 2019

View reviewed changes

Response to @jashapiro comments

e28b497

jaclyn-taroni merged commit 615fdf5 into AlexsLemonade:master Nov 4, 2019

jaclyn-taroni deleted the 162-tumor-descriptor branch November 4, 2019 13:31

jaclyn-taroni mentioned this pull request Nov 4, 2019

Planned Analysis: Primary vs Relapse #16

Closed

jashapiro mentioned this pull request Nov 4, 2019

Harmonize disease type separators #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional tables for sample distribution: breakdown by tumor descriptor #213

Additional tables for sample distribution: breakdown by tumor descriptor #213

jaclyn-taroni commented Nov 3, 2019

jharenza commented Nov 3, 2019

jashapiro left a comment

jashapiro Nov 3, 2019

jashapiro Nov 3, 2019

jashapiro Nov 3, 2019

jaclyn-taroni Nov 3, 2019

jashapiro Nov 3, 2019

jharenza Nov 3, 2019

jashapiro Nov 3, 2019

jaclyn-taroni Nov 3, 2019

jaclyn-taroni commented Nov 4, 2019


		### Paired genomic, transcriptomic assays

		How many participants have paired genomic and transcriptomic samples for the same `tumor_descriptor` values?

		@@ -1,21 +1,21 @@
		primary_site number_of_types max_type second_max_type

Additional tables for sample distribution: breakdown by tumor descriptor #213

Additional tables for sample distribution: breakdown by tumor descriptor #213

Conversation

jaclyn-taroni commented Nov 3, 2019

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

jharenza commented Nov 3, 2019

jashapiro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni commented Nov 4, 2019