PBTA Histologies: Independent sample base (2 of N) #864

kgaonkar6 · 2020-12-07T20:40:56Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Rna summary modules needed for subtyping need to be updated because of the 8 new stranded files in v18. In this PR I'm updating independent-samples module to read from pbta-histologies-base.tsv which will be used to generate RNA summary files and run molecular subtyping modules.

Module	Reason	Brief Description	output
`independent-samples`	adding 8 samples #749 and used in subtyping	Generates independent specimen lists for WGS/WXS samples updated in #795 and #797 and comment	`results/independent-specimens.wgs.primary.tsv` (included in data download) `results/independent-specimens.wgs.primary-plus.tsv` (included in data download) `results/independent-specimens.wgswxs.primary.tsv` (included in data download) `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download) `results/independent-specimens.rnaseq.primary-plus-stranded.tsv`(included in data download) `results/independent-specimens.rnaseq.primary-plus-polya.tsv` (included in data download)

|

What GitHub issue does your pull request address?

#861

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I've added a condition to run independent-samples modules with pbta-histologies-base.tsv in run-independent-samples.sh
Most of the different samples selected in the output files is from random selection since I didn't update the actual code generating the list.
Other modules that need to be rerun for subtyping are in #861

But some changes occurred because of different tumor_descriptors in v18

PT_ID	BS_ID	Change
PT_1VQWQ0TC	BS_QXZTRMWM	was Initial now Recurrence

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

8 new samples added to stranded RNA independent sample list and updated tumor_descriptor used for WGS/WXS independent sample list

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jharenza

@kgaonkar6 It looks like the files from collapse-rnaseq were also included in this PR - were you staggering these PRs or should these not be here? Looking at the next few, it seems like these are being staggered, but just confirming!

jharenza

Works as expected. Using the v18 data download, I ran with

RUN_FOR_SUBTYPING=${BASE_SUBTYPING:-1}

and got the same file dims as you did, but did not compare directly because of the random selection.

kgaonkar6 · 2020-12-08T14:23:16Z

These are staggered, I used instructions in https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#creating-stacked-pull-requests let me know it I missed something.

hmm CI seems to error out because it doesn't have a file named pbta-histologies-base.tsv should I add that to the CI subset files?

jharenza · 2020-12-08T18:42:56Z

These are staggered, I used instructions in https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#creating-stacked-pull-requests let me know it I missed something.

hmm CI seems to error out because it doesn't have a file named pbta-histologies-base.tsv should I add that to the CI subset files?

Ok great. No, I don't think there is a need for that right now because these PRs will not be merged until #857 goes in, which depends on CI file updates by @jaclyn-taroni and/or someone at CCDL.

jharenza

This looks good @kgaonkar6 - will you also update the READMEs for all of the modules to which you are adding the subtyping base run to note the option?

Thanks!

kgaonkar6 · 2020-12-09T14:46:40Z

Hi @jharenza just updated the README

jharenza

thanks for the updates! looks good.

cansavvy

I think this mostly looks good, I just have two comments that should probably be addressed (and might help resolve that circle CI error).

cansavvy · 2020-12-15T17:33:58Z

analyses/independent-samples/run-independent-samples.sh

@@ -22,3 +28,14 @@ Rscript 02-generate-independent-rnaseq.R \
  --output_directory results \
  --independent_dna_sample_df ../../data/independent-specimens.wgswxs.primary-plus.tsv  

+else 
+Rscript 01-generate-independent-specimens.R \
+  -f ../../data/pbta-histologies-base.tsv \


It looks like the only thing that is different is whether ../../data/pbta-histologies-base.tsvor ../../data/pbta-histologies.tsv is used, so a shorter way to do this is to use a smaller if/then specify a HISTOLOGIES_FILE variable and supply that to the -f and --histology_file options. Then you can get rid of this larger if/then and not have to repeat these Rscript calls.

ah that's a great idea, I'll update that, thanks!

cansavvy · 2020-12-15T17:36:49Z

analyses/independent-samples/README.md

@@ -25,11 +25,18 @@ independent-specimens.rnaseq.primary-plus-stranded.tsv

 To generate the independent sample lists and associated analysis of redundancies in the overall data set, run the following script from the project root directory:

+use BASE_SUBTYPING=1 to run this module using the pbta-histologies-base.tsv from data folder while running molecular-subtyping modules for release.
+```sh
+BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh 


It's unclear to me which histologies file should be used for the circle CI, but right now it doesn't have its own variable supplied and seems to be failing.

You can follow the OpenPBTA instructions here https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci to add an explicit call to the circle CI file so its clear which file should be being used.

Hi @cansavvy! Thanks for your comments. This sequence of PRs will require #849 to be merged and #871 to be generated, as they are utilizing the pbta-histologies-base.tsv file. I think the idea is for these to be reviewed, as multiple of the files generated from these modules will be included in the release, then for us to merge #849, and have all of these run in CI for confirmation they pass. To test these, you can download the v18 release thus far (we have the pbta-histologies-base.tsv file in the download.

Also I wanted to confirm that in the v18 release data folder we will have eventually have from #870 pbta-histologies.tsv so it should be able to run through .circleci/config.yml for checks as is, right? Or do I add the following to .circleci/config.yml?

BASE_SUBTYPING=1 ../analyses/independent-samples/run-independent-samples.sh

Yes, but according to the instructions, you will want to start the variable with OPENPBTA_ like the examples show: https://github.com/AlexsLemonade/OpenPBTA-analysis#passing-variables-only-in-ci

Any environment variables prefixed with OPENPBTA_ are passed to the specified shell script. Environment variables without this prefix are not passed.

ah got it, I didn't realize we will be running the BASE_SUBTYPING=1 runs in CI. But since we will I'm going to update the name for the variable as you've suggested here and re-run

Updated to OPENPBTA_BASE_HISTOLOGY variable

…6/OpenPBTA-analysis into independent_sample_base

kgaonkar6 · 2020-12-15T22:08:48Z

thanks for review @cansavvy @jharenza , latest commit was from a re-run with updated base histology(minor change to Normal sample rows and CNS_regions) file which doesn't change any results files.

kgaonkar6 · 2020-12-15T22:24:51Z

updated the run script with a simple if else for HISTOLOGY_FILE assignment instead of the previous longer code update as suggested by @cansavvy . I also updated reading results/independent-specimens.wgswxs.primary-plus.tsv instead of the data folder since we want to read the latest file generated for WGSWXS specimens from results folder.

…e_base

… independent_sample_base

jaclyn-taroni · 2021-01-06T01:03:58Z

It looks to me like all of @cansavvy's comments have been addressed since she last reviewed. I will merge if/when CI passes.

kgaonkar6 added 2 commits December 7, 2020 15:12

update to use base histologies file

bd3a94d

update to use base histology file

bea7b1d

jharenza self-requested a review December 8, 2020 01:27

jharenza reviewed Dec 8, 2020

View reviewed changes

jharenza approved these changes Dec 8, 2020

View reviewed changes

jharenza requested a review from jaclyn-taroni December 8, 2020 02:03

jharenza approved these changes Dec 8, 2020

View reviewed changes

kgaonkar6 added 2 commits December 9, 2020 09:36

Update README.md

2bbca44

Update README.md

aacff84

jharenza approved these changes Dec 9, 2020

View reviewed changes

jharenza changed the title ~~Independent sample base~~ PBTA Histologies: Independent sample base (2 of N) Dec 9, 2020

jaclyn-taroni added don't merge review before release labels Dec 13, 2020

jaclyn-taroni removed their request for review December 13, 2020 20:46

cansavvy reviewed Dec 15, 2020

View reviewed changes

kgaonkar6 added 2 commits December 15, 2020 17:05

re-run with udpated pbta-histologies-base.tsv

191f036

Merge branch 'independent_sample_base' of https://github.com/kgaonkar…

653f9fe

…6/OpenPBTA-analysis into independent_sample_base

re-run with udpated pbta-histologies-base.tsv

aeca244

kgaonkar6 and others added 3 commits December 17, 2020 15:18

updating BASE_SUBTYPING to OPENPBTA_BASE_SUBTYPING

8f07b4f

Update config.yml

08c8617

Update config.yml

bc2b923

jaclyn-taroni requested a review from cansavvy December 21, 2020 18:16

jaclyn-taroni removed the don't merge label Dec 21, 2020

kgaonkar6 and others added 2 commits January 5, 2021 13:54

Merge branch 'master' into independent_sample_base

aab627b

Merge remote-tracking branch 'upstream/master' into independent_sampl…

73f634e

…e_base

Merge remote-tracking branch 'kgaonkar6/independent_sample_base' into…

3b5fbcf

… independent_sample_base

jaclyn-taroni merged commit 50db4bc into AlexsLemonade:master Jan 6, 2021

jaclyn-taroni mentioned this pull request Jan 6, 2021

Rerun "PBTA histologies" PR code once they are all merged or reexamine results #891

Closed

kgaonkar6 deleted the independent_sample_base branch January 22, 2021 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBTA Histologies: Independent sample base (2 of N) #864

PBTA Histologies: Independent sample base (2 of N) #864

kgaonkar6 commented Dec 7, 2020 •

edited

Loading

jharenza left a comment •

edited

Loading

jharenza left a comment •

edited

Loading

kgaonkar6 commented Dec 8, 2020

jharenza commented Dec 8, 2020

jharenza left a comment

kgaonkar6 commented Dec 9, 2020

jharenza left a comment

cansavvy left a comment

cansavvy Dec 15, 2020

kgaonkar6 Dec 15, 2020

cansavvy Dec 15, 2020

jharenza Dec 15, 2020 •

edited

Loading

kgaonkar6 Dec 15, 2020

cansavvy Dec 17, 2020

kgaonkar6 Dec 17, 2020

kgaonkar6 Dec 17, 2020

kgaonkar6 commented Dec 15, 2020 •

edited

Loading

kgaonkar6 commented Dec 15, 2020

jaclyn-taroni commented Jan 6, 2021 •

edited

Loading

PBTA Histologies: Independent sample base (2 of N) #864

PBTA Histologies: Independent sample base (2 of N) #864

Conversation

kgaonkar6 commented Dec 7, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza left a comment • edited Loading

Choose a reason for hiding this comment

jharenza left a comment • edited Loading

Choose a reason for hiding this comment

kgaonkar6 commented Dec 8, 2020

jharenza commented Dec 8, 2020

jharenza left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Dec 9, 2020

jharenza left a comment

Choose a reason for hiding this comment

cansavvy left a comment

Choose a reason for hiding this comment

cansavvy Dec 15, 2020

Choose a reason for hiding this comment

kgaonkar6 Dec 15, 2020

Choose a reason for hiding this comment

cansavvy Dec 15, 2020

Choose a reason for hiding this comment

jharenza Dec 15, 2020 • edited Loading

Choose a reason for hiding this comment

kgaonkar6 Dec 15, 2020

Choose a reason for hiding this comment

cansavvy Dec 17, 2020

Choose a reason for hiding this comment

kgaonkar6 Dec 17, 2020

Choose a reason for hiding this comment

kgaonkar6 Dec 17, 2020

Choose a reason for hiding this comment

kgaonkar6 commented Dec 15, 2020 • edited Loading

kgaonkar6 commented Dec 15, 2020

jaclyn-taroni commented Jan 6, 2021 • edited Loading

kgaonkar6 commented Dec 7, 2020 •

edited

Loading

jharenza left a comment •

edited

Loading

jharenza left a comment •

edited

Loading

jharenza Dec 15, 2020 •

edited

Loading

kgaonkar6 commented Dec 15, 2020 •

edited

Loading

jaclyn-taroni commented Jan 6, 2021 •

edited

Loading