Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

PichaiRaman · 2019-07-12T14:43:46Z

The idea is to demonstrate that samples in the PBTA cluster by cancer type / molecular subtype using a dimensionality reduction technique such as PCA/T-SNE/or UMAP. Please see figure 2 (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006599) as an example of using this to look at GTEx and differentiate different tissue types.

jharenza · 2019-07-12T16:17:27Z

Also consider Figure 3 in this paper. Figure generation depends on #19.

cgreene · 2019-07-14T13:01:18Z

This is Figure 3 of the paper referenced by @jharenza.

I revised to remove the specific approach to figure generation so that we can have a discussion with folks who are interested in tackling this about what the preferred strategy is.

cbethell · 2019-08-13T19:25:53Z

I am beginning to flesh out the solution for this issue. If anyone has begun to work on it or have an idea of a preferred strategy, please feel free to express your thoughts.

- addition of the second notebook addressing issue AlexsLemonade#9 - Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed

@jaclyn-taroni

* Addition of unsupervised-transcriptomic-analysis * Removed attempt at batch correction - removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output - removed the installation of `limma` package from `Dockerfile` as the need for it was removed * PR review suggested changes from @jaclyn-taroni - changed analysis directory name and name of `.circleci` run to “transcriptomic-dimension-reduction” - changed the data_dir file path - renamed `df2` `metadata_df` - created a function to specify the scores data.frame and use the first 2 columns as x and y for a ggplot - removed output files that are no longer produced from documentation - `cowplot` is used to arrange plots in a grid (I am still playing around with this to achieve the best possible layout for the plots and legend) - filtered out the low count genes (I am also still looking into whether or not the method used is the best) - set a seed - replaced `reduction_fn` with `align_metadata` (this will also be changed once * PR review suggested changes v2 - split the notebook into 2 separate notebooks (one for data prep and the other for plotting) - made the parallel changes to `.circleci/config.yml` - used `dply::inner_join` to merge the dimension reduction scores with the metadata - added tsv files of the dimension reduction score data.frames - Created the suggested functions, including a wrapper function to execute all of the functions - saved aligned dimension reduction score data.frames as `RDS` files to be read into the second notebook * Addition of `02-transcriptomic-analysis-plotting.Rmd` - addition of the second notebook addressing issue #9 - Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed * Removed files from changes made in `01-transcriptomic-analysis-prep.Rmd` * Added `sessionInfo` to Rmd Also edited `.circleci/config.yml` to reflect the project's guidelines on adding analyses with multiple notebooks * Made parallel changes based on `01-transcriptomic-analysis-prep.R` - changed the names of the input file based on the changes made to `01-transcriptomic-analysis-prep.R` - fixed the `file.path`s using the appropriate `readr` function - added the results folder produced in the `01` nb as it is required for successful execution of this nb - reformatted code * Updated plots using the updated tsv files * True plots` update based on updated t-SNE results * @jashapiro PR review suggested changes - edited the `perform_dimension_reduction` function in `01` R script to allow for plotting of higher dimensions (although this is only applicable to the PCA plots). Please note that I chose to plot PC1 on the x-axis and PC2 on the y-axis due to trial and error and further suggestions in literature. - slightly increased the perplexity parameter for t-SNE - removed the sizing in `geom_point` - removed the unused magrittr pipe assignment - added plots using the same data but colored by the selection strategy found in the metadata's `RNA_library` variable * Added tsv output files containing scores separated by selection strategy - also changed the t-SNE perplexity parameter to 10 for better visualization * Reran t-SNE plots with new perplexity -changed plots to be colored by `broad_histology` -added plots for each selection strategy (poly-A and stranded) * Renamed `poly-A` files and edited PCs for plotting - removed extraneous parentheses - changed PCs used in combined selection strategy plots back to PC1 and PC2 - renamed all files containg `poly` to include `polyA` for specificity * Performed selection strategy filtering before dimension reduction * Revert change to `.circleci/config.yml` * Lowered perplexity in `.circleci/config.yml` * Changed `perplexity` to 1 * Commented out `polyA` - commented out the `polyA` stuff so `circleci` tests can pass

jaclyn-taroni · 2019-09-23T12:25:08Z

As noted in the discussion on #116, we were planning on adding a README and 2 shell scripts to analyses/transcriptomic-dimension-reduction. I think we should hold off on that until #121 and #124 are completed @cbethell. We also need to decide if we will continue to include the plots that demonstrate the issue with the RNA library differences in that module or if we will remove that because we have @jashapiro's addition in #120 to serve that purpose.

jaclyn-taroni · 2019-10-26T12:49:17Z

Some initial work on this is currently in analyses/transcriptomic-dimension-reduction. For this reason, I am assigning this an updated analysis label.

jaclyn-taroni · 2020-03-09T18:31:28Z

Closing all planned analysis tickets in favor of opening new proposed analysis/updated analysis tickets as needed.

cgreene changed the title ~~Planned Analysis: T-SNE/PCA on gene expression showing transcriptomic differences between different cancer types~~ Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types Jul 14, 2019

cansavvy assigned cansavvy and unassigned cansavvy Jul 25, 2019

cbethell mentioned this issue Aug 27, 2019

Addition of 01-transcriptomic-analysis-prep.Rmd #83

Merged

10 tasks

cbethell mentioned this issue Sep 9, 2019

Addition of 02-transcriptomic-analysis-plotting.Rmd #100

Merged

10 tasks

This was referenced Sep 17, 2019

PART 1: Refactor transcriptome dimension reduction module #111

Merged

PART 2: Refactor dimension reduction plotting #112

Merged

PART 3: refactor multipanel plot step for dimension reduction #116

Merged

This was referenced Sep 28, 2019

Update the txome dimensionality reduction module now that files are split #137

Merged

Add option to log2(x + 1) data for dimension reduction #140

Merged

jaclyn-taroni added the transcriptomic Related to or requires transcriptomic data label Oct 26, 2019

jaclyn-taroni added the updated analysis label Oct 26, 2019

jaclyn-taroni closed this as completed Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

PichaiRaman commented Jul 12, 2019

jharenza commented Jul 12, 2019 •

edited

Loading

cgreene commented Jul 14, 2019

cbethell commented Aug 13, 2019

jaclyn-taroni commented Sep 23, 2019

jaclyn-taroni commented Oct 26, 2019

jaclyn-taroni commented Mar 9, 2020

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

Comments

PichaiRaman commented Jul 12, 2019

jharenza commented Jul 12, 2019 • edited Loading

cgreene commented Jul 14, 2019

cbethell commented Aug 13, 2019

jaclyn-taroni commented Sep 23, 2019

jaclyn-taroni commented Oct 26, 2019

jaclyn-taroni commented Mar 9, 2020

jharenza commented Jul 12, 2019 •

edited

Loading