-
Notifications
You must be signed in to change notification settings - Fork 67
Planned Analysis: Unsupervised analysis of transcriptomic differences between different cancer types #9
Comments
Also consider Figure 3 in this paper. Figure generation depends on #19. |
This is Figure 3 of the paper referenced by @jharenza. I revised to remove the specific approach to figure generation so that we can have a discussion with folks who are interested in tackling this about what the preferred strategy is. |
I am beginning to flesh out the solution for this issue. If anyone has begun to work on it or have an idea of a preferred strategy, please feel free to express your thoughts. |
- addition of the second notebook addressing issue AlexsLemonade#9 - Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed
* Addition of unsupervised-transcriptomic-analysis * Removed attempt at batch correction - removed the attempt at batch correction from `01-transcriptomic-analysis.Rmd` and its html output - removed the installation of `limma` package from `Dockerfile` as the need for it was removed * PR review suggested changes from @jaclyn-taroni - changed analysis directory name and name of `.circleci` run to “transcriptomic-dimension-reduction” - changed the data_dir file path - renamed `df2` `metadata_df` - created a function to specify the scores data.frame and use the first 2 columns as x and y for a ggplot - removed output files that are no longer produced from documentation - `cowplot` is used to arrange plots in a grid (I am still playing around with this to achieve the best possible layout for the plots and legend) - filtered out the low count genes (I am also still looking into whether or not the method used is the best) - set a seed - replaced `reduction_fn` with `align_metadata` (this will also be changed once * PR review suggested changes v2 - split the notebook into 2 separate notebooks (one for data prep and the other for plotting) - made the parallel changes to `.circleci/config.yml` - used `dply::inner_join` to merge the dimension reduction scores with the metadata - added tsv files of the dimension reduction score data.frames - Created the suggested functions, including a wrapper function to execute all of the functions - saved aligned dimension reduction score data.frames as `RDS` files to be read into the second notebook * Addition of `02-transcriptomic-analysis-plotting.Rmd` - addition of the second notebook addressing issue #9 - Output files include two `PDF` files with plots for each expression data set (RSEM and Kallisto) across the three dimension reduction techniques performed * Removed files from changes made in `01-transcriptomic-analysis-prep.Rmd` * Added `sessionInfo` to Rmd Also edited `.circleci/config.yml` to reflect the project's guidelines on adding analyses with multiple notebooks * Made parallel changes based on `01-transcriptomic-analysis-prep.R` - changed the names of the input file based on the changes made to `01-transcriptomic-analysis-prep.R` - fixed the `file.path`s using the appropriate `readr` function - added the results folder produced in the `01` nb as it is required for successful execution of this nb - reformatted code * Updated plots using the updated tsv files * True plots` update based on updated t-SNE results * @jashapiro PR review suggested changes - edited the `perform_dimension_reduction` function in `01` R script to allow for plotting of higher dimensions (although this is only applicable to the PCA plots). Please note that I chose to plot PC1 on the x-axis and PC2 on the y-axis due to trial and error and further suggestions in literature. - slightly increased the perplexity parameter for t-SNE - removed the sizing in `geom_point` - removed the unused magrittr pipe assignment - added plots using the same data but colored by the selection strategy found in the metadata's `RNA_library` variable * Added tsv output files containing scores separated by selection strategy - also changed the t-SNE perplexity parameter to 10 for better visualization * Reran t-SNE plots with new perplexity -changed plots to be colored by `broad_histology` -added plots for each selection strategy (poly-A and stranded) * Renamed `poly-A` files and edited PCs for plotting - removed extraneous parentheses - changed PCs used in combined selection strategy plots back to PC1 and PC2 - renamed all files containg `poly` to include `polyA` for specificity * Performed selection strategy filtering before dimension reduction * Revert change to `.circleci/config.yml` * Lowered perplexity in `.circleci/config.yml` * Changed `perplexity` to 1 * Commented out `polyA` - commented out the `polyA` stuff so `circleci` tests can pass
As noted in the discussion on #116, we were planning on adding a README and 2 shell scripts to |
Some initial work on this is currently in |
Closing all planned analysis tickets in favor of opening new proposed analysis/updated analysis tickets as needed. |
The idea is to demonstrate that samples in the PBTA cluster by cancer type / molecular subtype using a dimensionality reduction technique such as PCA/T-SNE/or UMAP. Please see figure 2 (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006599) as an example of using this to look at GTEx and differentiate different tissue types.
The text was updated successfully, but these errors were encountered: