Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

roskamsh · 2024-05-10T11:26:49Z

add a profile that allows execution of targene on the researcher workbench, which includes the subsequent steps:
add containers which route through Google Container Registry for each container hosted on DockerHub
specify executor-specific instructions (google lifesciences API as well as resources)
add end-to-end test that tests new aspects of the pipeline (no_qq.jl)
run on all of us data for the FTO variant and BMI trait
update docs

This included additional features to add into the pipeline

Make CPU allocation for allofus profile as a function of the memory for a given task
Make cloud compliant by providing a docker container for every single process
Create ESTIMATORS_CONFIG empty file even if string mode is specified (when symbolic links not available)
Make QQ plot optional as output, in the case where a single variant-trait pair was tested
Update the FlashPCA2 container s.t. the user permissions were set back to root after compilation
Reroute base container versions from base.config into the allofus.config
add tests

resources:
https://support.researchallofus.org/hc/en-us/articles/21179878475028-Using-Docker-Images-on-the-Workbench
https://workbench.researchallofus.org/workspaces/aou-rw-5b81a011/howtousenextflowintheresearcherworkbenchv7/data

…/.nextflow/config

olivierlabayle

Thank you Breeshey.

On the two currently non-ticked boxes:

did you manage to run the pipeline entirely?
What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

conf/allofus_container.config

conf/allofus.config

conf/allofus_container.config

test.config

containers/flashpca/Dockerfile

roskamsh · 2024-09-23T16:48:21Z

Thank you Breeshey.

On the two currently non-ticked boxes:

did you manage to run the pipeline entirely?

What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

Currently I haven't run it with AOU data yet. That is the plan for this week! Then I should be able to add additional changes (if required) for a new COHORT mode. Once this is complete, I will update the docs. At this point, I have managed to run an end-to-end test using test.config (which is basically just https://github.com/TARGENE/targene-pipeline/blob/main/test/configs/custom_cohort_flat.config).

…ucture

olivierlabayle · 2024-11-07T20:25:56Z

I've currently put the new version in the docs as v0.11.1, but maybe v0.12.0 would be more appropriate? What do you think @olivierlabayle ?

I believe there is no breaking change so 0.11.1 is good!

roskamsh · 2024-11-08T14:55:12Z

@olivierlabayle this is ready for review again :)

olivierlabayle

Thank you Breeshey, just curious why you needed to increase time to 200 instead of 100?

olivierlabayle · 2024-11-09T04:49:17Z

test/ukb_gwas.jl

@@ -62,7 +62,7 @@ args = length(ARGS) > 0 ? ARGS : ["-profile", "local", "-resume"]

    # Check properly resumed
    resume_time = @elapsed run(cmd)
-    @test resume_time < 100
+    @test resume_time < 200


resuming took more than 100 seconds?

It did, yes. Not sure exactly why. But all other tests passed. Is this okay to update this?

Could you try setting this back to 100 before merging please? I'd like to make sure this was just a one of from GitHub actions.

I did and it still fails only on this step. See the most recent tests for more info!

It takes ~ 180 seconds to resume pretty consistently.

Potentially, although I thought it should still be able to cache, so it's a bit odd. Are you okay for me to increase the resume time to 200 or what are you thinking?

As I understand it, If a pipeline is interrupted for any reason no TMLE process wil be cached. Resuming will lead to all TMLE steps to be restarted from scratch. Given the current runtime and usually large number of processes in a classic pipeline run I would believe this to be a major issue.

I believe this is because you recreate the estimator file even though it already exists. This changes the timestamp and thus nextflow invalidates the cache for that input. Basically I think we need to check if the file corresponding to the "string mode" has already been written to the OUTDIR or not and not rewrite it if it is there.

@olivierlabayle - I have now added an option in the CreateEstimatorsChannel() function to check if the file was created by a previous run. This seems to have resolved the issue and all tests pass. Am I okay to merge this to main and publish a new release?

Yes thank you for all the hard work here! Looking forward to hear what you guys can find in all of us!

…lofus

roskamsh and others added 19 commits May 10, 2024 12:20

add all of us profile

e7fa074

add test config file to test running targene on AOU RW

cdd644b

add all of us profile configuration

78bf8d3

add actual test input data example

ffd108e

rearrange some baseline settings into the profile for allofus

fe36501

rearrange some parameters

c292b15

add allofus profile currently working

7438ace

remove olf file

1be28fa

add working container declaration in AOU RW

4bda18d

working test configuration for all of us

192921f

remove old file

ed610be

Merge branch 'main' into allofus

add2c14

update input parameters to be consistent with new version

afc3302

add gls and allofus profiles

72af43e

update flashpca and add baseline commandlinetools container

fa9a762

remove gls config

efa8883

add all of us RW specifications to run with gls profile included in ~…

883efab

…/.nextflow/config

add dockerfile for flashPCA container

d55b676

update to flashpca container version

da76044

roskamsh self-assigned this Sep 18, 2024

roskamsh added the enhancement New feature or request label Sep 18, 2024

roskamsh linked an issue Sep 18, 2024 that may be closed by this pull request

how to run this piepline in allofus workbench? #196

Closed

roskamsh requested review from olivierlabayle and joshua-slaughter September 23, 2024 09:39

olivierlabayle requested changes Sep 23, 2024

View reviewed changes

roskamsh added 4 commits September 24, 2024 14:27

address Olivier's comment; reorganize container declaration

e54d4f4

missing quotation

a8cfbde

base image only required when running only through docker on aou rw

1d2c2af

retain image naming system

e5fb4f5

roskamsh added 11 commits November 7, 2024 14:31

update realistic simulation to account for additional estimator

9c4561d

change input strucure to accomodate new estimators_config channel str…

577b864

…ucture

add new line

e170f00

combine estimators channel with TMLE inputs

0d1ab5f

update targene core version

831a9b7

address Olivier's comments

8aa2d0e

address Olivier's comments

57a785b

up targene version to v0.12.0 in docs

5fc72c1

slightly longer resume time in new version, but all other tests pass

885324b

update new input for EstimationWorkflow

3ef9cd0

revert resume time back to 100

0040d62

roskamsh added 11 commits November 8, 2024 08:49

try debugging with tmate on realistic simulations workflow

2fe7f6a

change targene version to v0.11.1 in docs

49d412a

revert back to original test runs

06e860a

revert to original test case

c159065

return to single-estimator tests

8cbf887

return to single-estimator tests

d284454

update inputs to EstimationWorkflow

4eebcec

correct targene version

1b178bd

up resume time to 200

d49986d

remove unused functions

05e01a1

update error message, previous one results in compilation error

3a6a07b

olivierlabayle approved these changes Nov 9, 2024

View reviewed changes

roskamsh added 3 commits November 12, 2024 08:44

put resume time back to 100

6886e3a

Merge branch 'allofus' of github.com:TARGENE/targene-pipeline into al…

52cab8b

…lofus

ensure file isn't recreated on resume

29d42bb

roskamsh merged commit 080867e into main Nov 13, 2024
19 checks passed

roskamsh deleted the allofus branch November 13, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

roskamsh commented May 10, 2024 •

edited

Loading

olivierlabayle left a comment

roskamsh commented Sep 23, 2024

olivierlabayle commented Nov 7, 2024

roskamsh commented Nov 8, 2024

olivierlabayle left a comment

olivierlabayle Nov 9, 2024

roskamsh Nov 11, 2024

olivierlabayle Nov 11, 2024

roskamsh Nov 12, 2024

roskamsh Nov 12, 2024

roskamsh Nov 12, 2024

olivierlabayle Nov 13, 2024

olivierlabayle Nov 13, 2024

roskamsh Nov 13, 2024 •

edited

Loading

olivierlabayle Nov 13, 2024

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

Conversation

roskamsh commented May 10, 2024 • edited Loading

olivierlabayle left a comment

Choose a reason for hiding this comment

roskamsh commented Sep 23, 2024

olivierlabayle commented Nov 7, 2024

roskamsh commented Nov 8, 2024

olivierlabayle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roskamsh Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roskamsh commented May 10, 2024 •

edited

Loading

roskamsh Nov 13, 2024 •

edited

Loading