Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PBTA Histologies: Tsne base (5 of N) #868

Merged
merged 18 commits into from
Jan 7, 2021

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Dec 7, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Re-running transcriptomic-dimension-reduction to get tsne/umap score for the 8 new samples.

Module Reason Brief Description output
transcriptomic-dimension-reduction adding 8 samples #749 used in subtyping Dimension reduction and visualization of RNA-seq data N/A

What was your approach?

Updated transcriptomic-dimension-reduction/01-dimension-reduction.sh and transcriptomic-dimension-reduction/dimension-reduction-plots.sh to have conditions to run with pbta-histologies-base.tsv if running module for subtyping.

What GitHub issue does your pull request address?

#861

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

We are only adding 8 stranded samples but the polya outputs have metadata attached which have changed in v18 ( glioma_region is CNS_region, some broad_histology and tumor_descriptors have changed) so they have changes along with tsne/umap coordinate changes.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

8 stranded samples were added to transcriptomic-dimension-reduction

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza jharenza self-requested a review December 8, 2020 03:04
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgaonkar6 This looks good, too. Please update the README.

I just saw one thing - that there is an NA in the legend of the plots, but no color. This may be due to the levels in the plotting coming from the full histology file that includes NA in the normals. If it is a quick fix, maybe you can update, but it may be beyond the scope of this PR.

@jharenza jharenza changed the title Tsne base PBTA Histologies: Tsne base (5 of N) Dec 9, 2020
@jaclyn-taroni jaclyn-taroni removed their request for review December 13, 2020 20:47
@kgaonkar6
Copy link
Collaborator Author

The NA arises from a manual color palette given in the code which has 21 colors
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/37b46f6b443a449a78b6f4c5394fd67e24e2d59e/analyses/transcriptomic-dimension-reduction/util/dimension-reduction-functions.R#L264-L271

But from the new base histology we have 22 broad_histology:

> check$broad_histology %>% unique()
 [1] "Low-grade astrocytic tumor"                    "Tumors of sellar region"                      
 [3] "Mesenchymal non-meningothelial tumor"          "Diffuse astrocytic and oligodendroglial tumor"
 [5] "Benign tumor"                                  "Ependymal tumor"                              
 [7] "Embryonal tumor"                               "Tumor of cranial and paraspinal nerves"       
 [9] "Histiocytic tumors"                            "Choroid plexus tumor"                         
[11] "Meningioma"                                    "Neuronal and mixed neuronal-glial tumor"      
[13] "Lymphoma"                                      "Metastatic tumors"                            
[15] "Non-tumor"                                     "Germ cell tumor"                              
[17] "Other astrocytic tumor"                        "Tumor of pineal region"                       
[19] "Histiocytic tumor"                             "Melanocytic tumor"                            
[21] "Pre-cancerous lesion"                          "Non-CNS tumor"   

@jaclyn-taroni
Copy link
Member

#866 has not gone in yet, so there are more changes here than need to be reviewed on the pull request. To facilitate review, I've compiled what needs to be reviewed here.

Code/text changes for review:

analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh
analyses/transcriptomic-dimension-reduction/README.md
analyses/transcriptomic-dimension-reduction/dimension-reduction-plots.sh

Someone should look at the diffs for the plots:

analyses/transcriptomic-dimension-reduction/plots/kallisto_polyA_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_polyA_none_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_stranded_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_stranded_none_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/rsem_stranded_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/rsem_stranded_none_broad_histology.pdf

Results files:

analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_log_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_umap_scores_aligned.tsv

I think some of these diffs look more different than I would expect but the reality is the changes in the underlying numbers is very small (e.g., git silliness)? This should be looked at more carefully during review.

We should probably also fix #868 (comment) by adding another hex code to the hard-coded palette. (Notably, we don't use that hardcoded palette when we make the pub-ready figure.)

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Jan 6, 2021

We should probably also fix #868 (comment) by adding another hex code to the hard-coded palette. (Notably, we don't use that hardcoded palette when we make the pub-ready figure.)

CI fails without that fix.

@cbethell
Copy link
Contributor

cbethell commented Jan 6, 2021

#866 has not gone in yet, so there are more changes here than need to be reviewed on the pull request. To facilitate review, I've compiled what needs to be reviewed here.

Code/text changes for review:

analyses/transcriptomic-dimension-reduction/01-dimension-reduction.sh
analyses/transcriptomic-dimension-reduction/README.md
analyses/transcriptomic-dimension-reduction/dimension-reduction-plots.sh

Someone should look at the diffs for the plots:

analyses/transcriptomic-dimension-reduction/plots/kallisto_polyA_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_polyA_none_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_stranded_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/kallisto_stranded_none_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/rsem_stranded_log_broad_histology.pdf
analyses/transcriptomic-dimension-reduction/plots/rsem_stranded_none_broad_histology.pdf

Results files:

analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_polyA_log_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_log_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/kallisto_stranded_none_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_log_umap_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_pca_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_tsne_scores_aligned.tsv
analyses/transcriptomic-dimension-reduction/results/rsem_polyA_none_umap_scores_aligned.tsv

I think some of these diffs look more different than I would expect but the reality is the changes in the underlying numbers is very small (e.g., git silliness)? This should be looked at more carefully during review.

We should probably also fix #868 (comment) by adding another hex code to the hard-coded palette. (Notably, we don't use that hardcoded palette when we make the pub-ready figure.)

I can go ahead and review the changes mentioned above, as well as add another hex code to the hard-coded palette.

@kgaonkar6
Copy link
Collaborator Author

The analyses/fusion_filtering/* files are not intentional addition in this branch. Since these are staggered branches I believe I didn't merge updated fusion_filtering_base branch to this branch at some point. Since #865 is now merged I should revert these files to master. Let me know if you have some other way to handle this.

Copy link
Contributor

@cbethell cbethell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this recent commit, I added an additional hex code to the hard-coded palette in dimension-reduction-functions.R as suggested in this comment.

Notably, when running the module's dimension-reduction-plots.sh script, I got an error requesting 26 instead of 22 colors. I have not yet pinpointed why (perhaps someone else, maybe @jaclyn-taroni? may want to take another look into this) but the results in this PR look reasonable based on the upstream changes made and noted in this PR's original comment 👍.
That said, the plots also look reasonable given that the t-SNE plots are the main plots that change with the slight differences in values in the results files.

I was also able to get rid of the fusion filtering file updates that were in this PR by updating this branch with what's now in master and re-running that module.

This PR looks just about ready, however, I noticed that the associated RDS files have snuck in along the way somewhere, which is strange because they look to be accounted for in the .gitignore file. Nonetheless, we will probably want to get rid of these files before merging this PR as we currently only have TSV results files in master.

@jaclyn-taroni
Copy link
Member

Thanks @cbethell, I will run this locally. Wanted to note that I missed @kgaonkar6's earlier point

but the polya outputs have metadata attached which have changed in v18 ( glioma_region is CNS_region, some broad_histology and tumor_descriptors have changed) so they have changes along with tsne/umap coordinate changes.

That most likely accounts for the diffs I did not expect for the poly-A dataset!

Regarding this point:

This PR looks just about ready, however, I noticed that the associated RDS files have snuck in along the way somewhere, which is strange because they look to be accounted for in the .gitignore file.

The RDS files I see in files changed are those that are in plots/plot_data, which we have tracked in the past (this is the current commit in master): https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/bb29b1c0b42376d27690c06196f37f70ee1d3fd7/analyses/transcriptomic-dimension-reduction/plots/plot_data

The RDS files that are ignored follow this pattern:

results/*_model.rds

which I do not see reflected here (and I think would be too large to push to GitHub based on the comment in .gitignore.)

@cbethell
Copy link
Contributor

cbethell commented Jan 7, 2021

The RDS files I see in files changed are those that are in plots/plot_data, which we have tracked in the past (this is the current commit in master): https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/bb29b1c0b42376d27690c06196f37f70ee1d3fd7/analyses/transcriptomic-dimension-reduction/plots/plot_data

The RDS files that are ignored follow this pattern:

results/*_model.rds

which I do not see reflected here (and I think would be too large to push to GitHub based on the comment in .gitignore.)

Ah you're right! In that case, my point re the RDS files can be disregarded 👍

@jaclyn-taroni
Copy link
Member

I ran this locally and did not get an error related to

requesting 26 instead of 22 colors.

I am, however, seeing the same issues that prompted us to file #716. That is a broader issue that's beyond the scope of this pull request. I am going to get this merged so we can continue on with #869 and #870.

@jaclyn-taroni jaclyn-taroni merged commit 68cf73d into AlexsLemonade:master Jan 7, 2021
@kgaonkar6 kgaonkar6 deleted the tsne-base branch January 22, 2021 21:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants