Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

Merged
merged 100 commits into from
Nov 13, 2024

Conversation

roskamsh
Copy link
Collaborator

@roskamsh roskamsh commented May 10, 2024

  1. add a profile that allows execution of targene on the researcher workbench, which includes the subsequent steps:
  2. add containers which route through Google Container Registry for each container hosted on DockerHub
  3. specify executor-specific instructions (google lifesciences API as well as resources)
  4. add end-to-end test that tests new aspects of the pipeline (no_qq.jl)
  5. run on all of us data for the FTO variant and BMI trait
  6. update docs

This included additional features to add into the pipeline

  • Make CPU allocation for allofus profile as a function of the memory for a given task
  • Make cloud compliant by providing a docker container for every single process
  • Create ESTIMATORS_CONFIG empty file even if string mode is specified (when symbolic links not available)
  • Make QQ plot optional as output, in the case where a single variant-trait pair was tested
  • Update the FlashPCA2 container s.t. the user permissions were set back to root after compilation
  • Reroute base container versions from base.config into the allofus.config
  • add tests

resources:
https://support.researchallofus.org/hc/en-us/articles/21179878475028-Using-Docker-Images-on-the-Workbench
https://workbench.researchallofus.org/workspaces/aou-rw-5b81a011/howtousenextflowintheresearcherworkbenchv7/data

@roskamsh roskamsh self-assigned this Sep 18, 2024
@roskamsh roskamsh added the enhancement New feature or request label Sep 18, 2024
@roskamsh roskamsh linked an issue Sep 18, 2024 that may be closed by this pull request
Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Breeshey.

On the two currently non-ticked boxes:

  • did you manage to run the pipeline entirely?
  • What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

conf/allofus_container.config Outdated Show resolved Hide resolved
conf/allofus.config Outdated Show resolved Hide resolved
conf/allofus_container.config Outdated Show resolved Hide resolved
test.config Outdated Show resolved Hide resolved
containers/flashpca/Dockerfile Outdated Show resolved Hide resolved
@roskamsh
Copy link
Collaborator Author

Thank you Breeshey.

On the two currently non-ticked boxes:

  • did you manage to run the pipeline entirely?
  • What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

Currently I haven't run it with AOU data yet. That is the plan for this week! Then I should be able to add additional changes (if required) for a new COHORT mode. Once this is complete, I will update the docs. At this point, I have managed to run an end-to-end test using test.config (which is basically just https://github.com/TARGENE/targene-pipeline/blob/main/test/configs/custom_cohort_flat.config).

@olivierlabayle
Copy link
Member

I've currently put the new version in the docs as v0.11.1, but maybe v0.12.0 would be more appropriate? What do you think @olivierlabayle ?

I believe there is no breaking change so 0.11.1 is good!

@roskamsh
Copy link
Collaborator Author

roskamsh commented Nov 8, 2024

@olivierlabayle this is ready for review again :)

Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Breeshey, just curious why you needed to increase time to 200 instead of 100?

test/ukb_gwas.jl Outdated
@@ -62,7 +62,7 @@ args = length(ARGS) > 0 ? ARGS : ["-profile", "local", "-resume"]

# Check properly resumed
resume_time = @elapsed run(cmd)
@test resume_time < 100
@test resume_time < 200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resuming took more than 100 seconds?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did, yes. Not sure exactly why. But all other tests passed. Is this okay to update this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try setting this back to 100 before merging please? I'd like to make sure this was just a one of from GitHub actions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did and it still fails only on this step. See the most recent tests for more info!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes ~ 180 seconds to resume pretty consistently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, although I thought it should still be able to cache, so it's a bit odd. Are you okay for me to increase the resume time to 200 or what are you thinking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, If a pipeline is interrupted for any reason no TMLE process wil be cached. Resuming will lead to all TMLE steps to be restarted from scratch. Given the current runtime and usually large number of processes in a classic pipeline run I would believe this to be a major issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is because you recreate the estimator file even though it already exists. This changes the timestamp and thus nextflow invalidates the cache for that input. Basically I think we need to check if the file corresponding to the "string mode" has already been written to the OUTDIR or not and not rewrite it if it is there.

Copy link
Collaborator Author

@roskamsh roskamsh Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olivierlabayle - I have now added an option in the CreateEstimatorsChannel() function to check if the file was created by a previous run. This seems to have resolved the issue and all tests pass. Am I okay to merge this to main and publish a new release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thank you for all the hard work here! Looking forward to hear what you guys can find in all of us!

@roskamsh roskamsh merged commit 080867e into main Nov 13, 2024
19 checks passed
@roskamsh roskamsh deleted the allofus branch November 13, 2024 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants