Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PBTA Histologies: Independent sample base (2 of N) #864

Merged
merged 13 commits into from
Jan 6, 2021

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Dec 7, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Rna summary modules needed for subtyping need to be updated because of the 8 new stranded files in v18. In this PR I'm updating independent-samples module to read from pbta-histologies-base.tsv which will be used to generate RNA summary files and run molecular subtyping modules.

Module Reason Brief Description output
independent-samples adding 8 samples #749 and used in subtyping Generates independent specimen lists for WGS/WXS samples updated in #795 and #797 and comment results/independent-specimens.wgs.primary.tsv (included in data download)
results/independent-specimens.wgs.primary-plus.tsv (included in data download)
results/independent-specimens.wgswxs.primary.tsv (included in data download)
results/independent-specimens.wgswxs.primary-plus.tsv (included in data download)
results/independent-specimens.rnaseq.primary-plus-stranded.tsv(included in data download)
results/independent-specimens.rnaseq.primary-plus-polya.tsv (included in data download)

|

What GitHub issue does your pull request address?

#861

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I've added a condition to run independent-samples modules with pbta-histologies-base.tsv in run-independent-samples.sh
Most of the different samples selected in the output files is from random selection since I didn't update the actual code generating the list.
Other modules that need to be rerun for subtyping are in #861

But some changes occurred because of different tumor_descriptors in v18

PT_ID BS_ID Change
PT_1VQWQ0TC BS_QXZTRMWM was Initial now Recurrence

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

8 new samples added to stranded RNA independent sample list and updated tumor_descriptor used for WGS/WXS independent sample list

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza jharenza self-requested a review December 8, 2020 01:27
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgaonkar6 It looks like the files from collapse-rnaseq were also included in this PR - were you staggering these PRs or should these not be here? Looking at the next few, it seems like these are being staggered, but just confirming!

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected. Using the v18 data download, I ran with

RUN_FOR_SUBTYPING=${BASE_SUBTYPING:-1}

and got the same file dims as you did, but did not compare directly because of the random selection.

@kgaonkar6
Copy link
Collaborator Author

These are staggered, I used instructions in https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#creating-stacked-pull-requests let me know it I missed something.

hmm CI seems to error out because it doesn't have a file named pbta-histologies-base.tsv should I add that to the CI subset files?

@jharenza
Copy link
Collaborator

jharenza commented Dec 8, 2020

These are staggered, I used instructions in https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#creating-stacked-pull-requests let me know it I missed something.

hmm CI seems to error out because it doesn't have a file named pbta-histologies-base.tsv should I add that to the CI subset files?

Ok great. No, I don't think there is a need for that right now because these PRs will not be merged until #857 goes in, which depends on CI file updates by @jaclyn-taroni and/or someone at CCDL.

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @kgaonkar6 - will you also update the READMEs for all of the modules to which you are adding the subtyping base run to note the option?

Thanks!

@kgaonkar6
Copy link
Collaborator Author

Hi @jharenza just updated the README

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates! looks good.

@jharenza jharenza changed the title Independent sample base PBTA Histologies: Independent sample base (2 of N) Dec 9, 2020
@jaclyn-taroni jaclyn-taroni removed their request for review December 13, 2020 20:46
Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks good, I just have two comments that should probably be addressed (and might help resolve that circle CI error).

@@ -22,3 +28,14 @@ Rscript 02-generate-independent-rnaseq.R \
--output_directory results \
--independent_dna_sample_df ../../data/independent-specimens.wgswxs.primary-plus.tsv

else
Rscript 01-generate-independent-specimens.R \
-f ../../data/pbta-histologies-base.tsv \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only thing that is different is whether ../../data/pbta-histologies-base.tsvor ../../data/pbta-histologies.tsv is used, so a shorter way to do this is to use a smaller if/then specify a HISTOLOGIES_FILE variable and supply that to the -f and --histology_file options. Then you can get rid of this larger if/then and not have to repeat these Rscript calls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a great idea, I'll update that, thanks!

@@ -25,11 +25,18 @@ independent-specimens.rnaseq.primary-plus-stranded.tsv

To generate the independent sample lists and associated analysis of redundancies in the overall data set, run the following script from the project root directory:

use BASE_SUBTYPING=1 to run this module using the pbta-histologies-base.tsv from data folder while running molecular-subtyping modules for release.
```sh
BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me which histologies file should be used for the circle CI, but right now it doesn't have its own variable supplied and seems to be failing.

You can follow the OpenPBTA instructions here https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci to add an explicit call to the circle CI file so its clear which file should be being used.

Copy link
Collaborator

@jharenza jharenza Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cansavvy! Thanks for your comments. This sequence of PRs will require #849 to be merged and #871 to be generated, as they are utilizing the pbta-histologies-base.tsv file. I think the idea is for these to be reviewed, as multiple of the files generated from these modules will be included in the release, then for us to merge #849, and have all of these run in CI for confirmation they pass. To test these, you can download the v18 release thus far (we have the pbta-histologies-base.tsv file in the download.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I wanted to confirm that in the v18 release data folder we will have eventually have from #870 pbta-histologies.tsv so it should be able to run through .circleci/config.yml for checks as is, right? Or do I add the following to .circleci/config.yml?

BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but according to the instructions, you will want to start the variable with OPENPBTA_ like the examples show: https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci

Any environment variables prefixed with OPENPBTA_ are passed to the specified shell script. Environment variables without this prefix are not passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah got it, I didn't realize we will be running the BASE_SUBTYPING=1 runs in CI. But since we will I'm going to update the name for the variable as you've suggested here and re-run

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to OPENPBTA_BASE_HISTOLOGY variable

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Dec 15, 2020

thanks for review @cansavvy @jharenza , latest commit was from a re-run with updated base histology(minor change to Normal sample rows and CNS_regions) file which doesn't change any results files.

@kgaonkar6
Copy link
Collaborator Author

updated the run script with a simple if else for HISTOLOGY_FILE assignment instead of the previous longer code update as suggested by @cansavvy . I also updated reading results/independent-specimens.wgswxs.primary-plus.tsv instead of the data folder since we want to read the latest file generated for WGSWXS specimens from results folder.

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Jan 6, 2021

It looks to me like all of @cansavvy's comments have been addressed since she last reviewed. I will merge if/when CI passes.

@jaclyn-taroni jaclyn-taroni merged commit 50db4bc into AlexsLemonade:master Jan 6, 2021
@kgaonkar6 kgaonkar6 deleted the independent_sample_base branch January 22, 2021 21:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants