-
Notifications
You must be signed in to change notification settings - Fork 67
PBTA Histologies: Independent sample base (2 of N) #864
PBTA Histologies: Independent sample base (2 of N) #864
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaonkar6 It looks like the files from collapse-rnaseq
were also included in this PR - were you staggering these PRs or should these not be here? Looking at the next few, it seems like these are being staggered, but just confirming!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected. Using the v18 data download, I ran with
RUN_FOR_SUBTYPING=${BASE_SUBTYPING:-1}
and got the same file dims as you did, but did not compare directly because of the random selection.
These are staggered, I used instructions in https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#creating-stacked-pull-requests let me know it I missed something. hmm CI seems to error out because it doesn't have a file named pbta-histologies-base.tsv should I add that to the CI subset files? |
Ok great. No, I don't think there is a need for that right now because these PRs will not be merged until #857 goes in, which depends on CI file updates by @jaclyn-taroni and/or someone at CCDL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good @kgaonkar6 - will you also update the READMEs for all of the modules to which you are adding the subtyping base run to note the option?
Thanks!
Hi @jharenza just updated the README |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates! looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mostly looks good, I just have two comments that should probably be addressed (and might help resolve that circle CI error).
@@ -22,3 +28,14 @@ Rscript 02-generate-independent-rnaseq.R \ | |||
--output_directory results \ | |||
--independent_dna_sample_df ../../data/independent-specimens.wgswxs.primary-plus.tsv | |||
|
|||
else | |||
Rscript 01-generate-independent-specimens.R \ | |||
-f ../../data/pbta-histologies-base.tsv \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only thing that is different is whether ../../data/pbta-histologies-base.tsv
or ../../data/pbta-histologies.tsv
is used, so a shorter way to do this is to use a smaller if/then specify a HISTOLOGIES_FILE
variable and supply that to the -f
and --histology_file
options. Then you can get rid of this larger if/then and not have to repeat these Rscript calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that's a great idea, I'll update that, thanks!
@@ -25,11 +25,18 @@ independent-specimens.rnaseq.primary-plus-stranded.tsv | |||
|
|||
To generate the independent sample lists and associated analysis of redundancies in the overall data set, run the following script from the project root directory: | |||
|
|||
use BASE_SUBTYPING=1 to run this module using the pbta-histologies-base.tsv from data folder while running molecular-subtyping modules for release. | |||
```sh | |||
BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me which histologies file should be used for the circle CI, but right now it doesn't have its own variable supplied and seems to be failing.
You can follow the OpenPBTA instructions here https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci to add an explicit call to the circle CI file so its clear which file should be being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cansavvy! Thanks for your comments. This sequence of PRs will require #849 to be merged and #871 to be generated, as they are utilizing the pbta-histologies-base.tsv
file. I think the idea is for these to be reviewed, as multiple of the files generated from these modules will be included in the release, then for us to merge #849, and have all of these run in CI for confirmation they pass. To test these, you can download the v18 release thus far (we have the pbta-histologies-base.tsv
file in the download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I wanted to confirm that in the v18 release data folder we will have eventually have from #870 pbta-histologies.tsv so it should be able to run through .circleci/config.yml for checks as is, right? Or do I add the following to .circleci/config.yml?
BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but according to the instructions, you will want to start the variable with OPENPBTA_
like the examples show: https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci
Any environment variables prefixed with OPENPBTA_ are passed to the specified shell script. Environment variables without this prefix are not passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, I didn't realize we will be running the BASE_SUBTYPING=1 runs in CI. But since we will I'm going to update the name for the variable as you've suggested here and re-run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to OPENPBTA_BASE_HISTOLOGY variable
updated the run script with a simple if else for HISTOLOGY_FILE assignment instead of the previous longer code update as suggested by @cansavvy . I also updated reading |
… independent_sample_base
It looks to me like all of @cansavvy's comments have been addressed since she last reviewed. I will merge if/when CI passes. |
Purpose/implementation Section
What scientific question is your analysis addressing?
Rna summary modules needed for subtyping need to be updated because of the 8 new stranded files in v18. In this PR I'm updating independent-samples module to read from pbta-histologies-base.tsv which will be used to generate RNA summary files and run molecular subtyping modules.
independent-samples
results/independent-specimens.wgs.primary.tsv
(included in data download)results/independent-specimens.wgs.primary-plus.tsv
(included in data download)results/independent-specimens.wgswxs.primary.tsv
(included in data download)results/independent-specimens.wgswxs.primary-plus.tsv
(included in data download)results/independent-specimens.rnaseq.primary-plus-stranded.tsv
(included in data download)results/independent-specimens.rnaseq.primary-plus-polya.tsv
(included in data download)|
What GitHub issue does your pull request address?
#861
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
I've added a condition to run independent-samples modules with pbta-histologies-base.tsv in run-independent-samples.sh
Most of the different samples selected in the output files is from random selection since I didn't update the actual code generating the list.
Other modules that need to be rerun for subtyping are in #861
But some changes occurred because of different tumor_descriptors in v18
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
tables
What is your summary of the results?
8 new samples added to stranded RNA independent sample list and updated tumor_descriptor used for WGS/WXS independent sample list
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.