-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allofus - Adapt targene for use on the All of Us Researcher Workbench #174
Conversation
…/.nextflow/config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Breeshey.
On the two currently non-ticked boxes:
- did you manage to run the pipeline entirely?
- What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.
Currently I haven't run it with AOU data yet. That is the plan for this week! Then I should be able to add additional changes (if required) for a new COHORT mode. Once this is complete, I will update the docs. At this point, I have managed to run an end-to-end test using test.config (which is basically just https://github.com/TARGENE/targene-pipeline/blob/main/test/configs/custom_cohort_flat.config). |
I believe there is no breaking change so 0.11.1 is good! |
@olivierlabayle this is ready for review again :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Breeshey, just curious why you needed to increase time to 200 instead of 100?
test/ukb_gwas.jl
Outdated
@@ -62,7 +62,7 @@ args = length(ARGS) > 0 ? ARGS : ["-profile", "local", "-resume"] | |||
|
|||
# Check properly resumed | |||
resume_time = @elapsed run(cmd) | |||
@test resume_time < 100 | |||
@test resume_time < 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resuming took more than 100 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did, yes. Not sure exactly why. But all other tests passed. Is this okay to update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try setting this back to 100 before merging please? I'd like to make sure this was just a one of from GitHub actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did and it still fails only on this step. See the most recent tests for more info!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes ~ 180 seconds to resume pretty consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, although I thought it should still be able to cache, so it's a bit odd. Are you okay for me to increase the resume time to 200 or what are you thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, If a pipeline is interrupted for any reason no TMLE process wil be cached. Resuming will lead to all TMLE steps to be restarted from scratch. Given the current runtime and usually large number of processes in a classic pipeline run I would believe this to be a major issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is because you recreate the estimator file even though it already exists. This changes the timestamp and thus nextflow invalidates the cache for that input. Basically I think we need to check if the file corresponding to the "string mode" has already been written to the OUTDIR or not and not rewrite it if it is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olivierlabayle - I have now added an option in the CreateEstimatorsChannel() function to check if the file was created by a previous run. This seems to have resolved the issue and all tests pass. Am I okay to merge this to main and publish a new release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thank you for all the hard work here! Looking forward to hear what you guys can find in all of us!
This included additional features to add into the pipeline
resources:
https://support.researchallofus.org/hc/en-us/articles/21179878475028-Using-Docker-Images-on-the-Workbench
https://workbench.researchallofus.org/workspaces/aou-rw-5b81a011/howtousenextflowintheresearcherworkbenchv7/data