Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9

Closed
PichaiRaman opened this issue Jul 12, 2019 · 6 comments
Labels
transcriptomic Related to or requires transcriptomic data updated analysis

Comments

@PichaiRaman
Copy link
Contributor

The idea is to demonstrate that samples in the PBTA cluster by cancer type / molecular subtype using a dimensionality reduction technique such as PCA/T-SNE/or UMAP. Please see figure 2 (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006599) as an example of using this to look at GTEx and differentiate different tissue types.

@jharenza
Copy link
Collaborator

jharenza commented Jul 12, 2019

Also consider Figure 3 in this paper. Figure generation depends on #19.

@cgreene cgreene changed the title Planned Analysis: T-SNE/PCA on gene expression showing transcriptomic differences between different cancer types Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types Jul 14, 2019
@cgreene
Copy link
Collaborator

cgreene commented Jul 14, 2019

Screen Shot 2019-07-14 at 9 00 15 AM

This is Figure 3 of the paper referenced by @jharenza.

I revised to remove the specific approach to figure generation so that we can have a discussion with folks who are interested in tackling this about what the preferred strategy is.

@cansavvy cansavvy assigned cansavvy and unassigned cansavvy Jul 25, 2019
@cbethell
Copy link
Contributor

I am beginning to flesh out the solution for this issue. If anyone has begun to work on it or have an idea of a preferred strategy, please feel free to express your thoughts.

cbethell added a commit to cbethell/OpenPBTA-analysis that referenced this issue Sep 9, 2019
- addition of the second notebook addressing issue AlexsLemonade#9 
- Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed
jaclyn-taroni pushed a commit that referenced this issue Sep 17, 2019
* Addition of unsupervised-transcriptomic-analysis

* Removed attempt at batch correction

- removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output
- removed the installation of `limma` package from `Dockerfile` as the need for it was removed

* PR review suggested changes from @jaclyn-taroni

- changed analysis directory name and name of `.circleci` run to “transcriptomic-dimension-reduction”
- changed the data_dir file path
- renamed `df2` `metadata_df`
- created a function to specify the scores data.frame and use the first 2 columns as x and y for a ggplot
- removed output files that are no longer produced from documentation 
- `cowplot` is used to arrange plots in a grid (I am still playing around with this to achieve the best possible layout for the plots and legend)
- filtered out the low count genes (I am also still looking into whether or not the method used is the best)
- set a seed
- replaced `reduction_fn` with `align_metadata` (this will also be changed once

* PR review suggested changes v2

- split the notebook into 2 separate notebooks (one for data prep and the other for plotting) 
- made the parallel changes to `.circleci/config.yml`
- used `dply::inner_join` to merge the dimension reduction scores with the metadata 
- added tsv files of the dimension reduction score data.frames 
- Created the suggested functions, including a wrapper function to execute all of the functions 
- saved aligned dimension reduction score data.frames as `RDS` files to be read into the second notebook

* Addition of `02-transcriptomic-analysis-plotting.Rmd`

- addition of the second notebook addressing issue #9 
- Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed

* Removed files from changes made in `01-transcriptomic-analysis-prep.Rmd`

* Added `sessionInfo` to Rmd 

Also edited `.circleci/config.yml` to reflect the project's guidelines on adding analyses with multiple notebooks

* Made parallel changes based on `01-transcriptomic-analysis-prep.R`

- changed the names of the input file based on the changes made to `01-transcriptomic-analysis-prep.R`
- fixed the `file.path`s using the appropriate `readr` function 
- added the results folder produced in the `01` nb as it is required for successful execution of this nb
- reformatted code

* Updated plots using the updated tsv files

* True plots` update based on updated t-SNE results

* @jashapiro PR review suggested changes 

- edited the `perform_dimension_reduction` function in `01` R script to allow for plotting of higher dimensions (although this is only applicable to the PCA plots). Please note that I chose to plot PC1 on the x-axis and PC2 on the y-axis due to trial and error and further suggestions in literature. 
- slightly increased the perplexity parameter for t-SNE 
- removed the sizing in `geom_point`
- removed the unused magrittr pipe assignment 
- added plots using the same data but colored by the selection strategy found in the metadata's `RNA_library` variable

* Added tsv output files containing scores separated by selection strategy

- also changed the t-SNE perplexity parameter to 10 for better visualization

* Reran t-SNE plots with new perplexity 

-changed plots to be colored by `broad_histology`
-added plots for each selection strategy (poly-A and stranded)

* Renamed `poly-A` files and edited PCs for plotting

- removed extraneous parentheses 
- changed  PCs used in combined selection strategy plots back to PC1 and PC2
- renamed all files containg `poly` to include `polyA` for specificity

* Performed selection strategy filtering before dimension reduction

* Revert change to `.circleci/config.yml`

* Lowered perplexity in `.circleci/config.yml`

* Changed `perplexity` to 1

* Commented out `polyA` 

- commented out the `polyA` stuff so `circleci` tests can pass
@jaclyn-taroni
Copy link
Member

As noted in the discussion on #116, we were planning on adding a README and 2 shell scripts to analyses/transcriptomic-dimension-reduction. I think we should hold off on that until #121 and #124 are completed @cbethell. We also need to decide if we will continue to include the plots that demonstrate the issue with the RNA library differences in that module or if we will remove that because we have @jashapiro's addition in #120 to serve that purpose.

@jaclyn-taroni
Copy link
Member

Some initial work on this is currently in analyses/transcriptomic-dimension-reduction. For this reason, I am assigning this an updated analysis label.

@jaclyn-taroni
Copy link
Member

Closing all planned analysis tickets in favor of opening new proposed analysis/updated analysis tickets as needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
transcriptomic Related to or requires transcriptomic data updated analysis
Projects
None yet
Development

No branches or pull requests

6 participants