Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Additional tables for sample distribution: breakdown by tumor descriptor #213

Merged

Conversation

jaclyn-taroni
Copy link
Member

Purpose/implementation

To my knowledge, we don't have a tables that:

  • Just list the number of assays (with the paired normal removed)
  • Look at the breakdown by tumor_descriptor (e.g., primary vs. other) within assay type (WGS/WXS vs. RNA-seq)
  • Look at the breakdown by tumor_descriptor by histology (useful for things like planning Planned Analysis: Primary vs Relapse #16)

Here I'm adding those tables in a single notebook and using the flextable package to display things nicely.

You can see the rendered notebook here.

Issue

Closes #162

Directions for reviewers

  • Please check for correctness, i.e., the code accomplishes what the prose describes.
  • What do you think of the flextable display?
  • Are there any additional tables that you think it'd be useful to have in this notebook?

Results

One thing that struck me -- within a histology (as tracked in disease_type_new), there are not that many instances of: has primary and progressive data, has primary and recurrence data. This is good to know!

Docker and continuous integration

Check all those that apply or remove this section if it is not applicable.

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

@jharenza
Copy link
Collaborator

jharenza commented Nov 3, 2019

I like this!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and a nice set of analyses to have. A couple little questions, but more about upstream data..


Setting aside the `Panel` sample for the moment.

```{r}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description of what you are doing here? It is seems from the code that you are collapsing multiple samples to create a single descriptor when there is more than one kind of tumor, but it seems worth noting in text.


### Paired genomic, transcriptomic assays

How many participants have paired genomic and transcriptomic samples for the same `tumor_descriptor` values?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here seems to require all descriptors to be paired for each participant? So if a participant had genomic and transcriptomic samples for Initial CNS tumor, but not both for Progressive, say, then it wouldn't be counted here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, another few lines down, I see that this is the case, and you have addressed it. But I do think it leaves the tables up here somewhat hard to interpret. At first glance, it appears that there may be many missing WGS samples, but in fact a portion of those are where there is actually missing RNAseq data making the descriptors not match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take out looking at paired_df in line 134. I think it's more confusing than helpful.

regulartable() %>%
fontsize(size = 12, part="all")
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have questions about some of these descriptors, and the meaning of delimiters. For example, we have: Ganglioglioma;Low-grade glioma/astrocytoma (WHO grade I/II), as well as Ganglioglioma on its own, and Low-grade glioma;astrocytoma (WHO grade I/II) on its own. Is the first some kind of combination of the latter two? Is there a semantic difference between the use of ; and / in these cases? This PR is probably not the right place for this discussion...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, these are all derived from the way they were added to the database - seems the \ should not be between LGG/astrocytoma, but that should be a ;. Ganglioglioma and LGG/astrocytoma are different and only shown together since they were both on the pathology report. If possible, through genomic analyses, we could further dial into the diagnoses, we can possibly replace with one or the other.

@@ -1,21 +1,21 @@
primary_site number_of_types max_type second_max_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems broken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep definitely broken #214

@jaclyn-taroni
Copy link
Member Author

The table that is broken is now tracked in #220, so I will go ahead and merge this.

@jaclyn-taroni jaclyn-taroni merged commit 615fdf5 into AlexsLemonade:master Nov 4, 2019
@jaclyn-taroni jaclyn-taroni deleted the 162-tumor-descriptor branch November 4, 2019 13:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update: sample distribution plots accounting for multiple samples from the same individual
3 participants