-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using coroutines to execute multiple steps and subworkflows in a single worker #1218
Comments
In case that the generator function is a subroutine, the caller function has to wrap it as
|
Looks like it will this help with this situation? |
Yes, it will reduce the number of processes (strict to |
Okay I did not use
I'm waiting a bit more to see the outcome. Currently the process seems hanging. |
So it appears to hang. I cancelled it,
and ran it for another time using
|
Is this the latest? I think I fixed something after I sent the message. |
Yes it is the latest version
|
Strange, that function is simple, and the only |
Oh okay it was my bad. I messed up a local copy a few days ago when I was trying to figure out if memory usage was related to number of jobs. ... Now the jobs are running. I have not limited its memory usage yet, just to check if it works or hangs. Will then submit to a compute node and monitor memory usage there if all works well for the current run. |
Great. The branch now passes travis and your hang test so any new failure should be recorded as tests. |
The workflow no long hangs, which is good. I still get this though:
so the workflow did not finish. |
Got this at rerun:
It seems all the output are generated. This failed towards the end. Not sure if it can be reproduced. Doing it again now. |
Missing means the task file disappears and the rerun confirms it. Rerun should perhaps recreate it, not sure why this happened. |
Unfortunately it still hangs I think at the end of the pipeline though. I've waited for it overnight.. But previously it hangs for every step of the pipeline so current behavior is better. Also when I check the number of processes it seems correct. I see you've pushed another commit. I'll check it out and try again. If it still problematic I'll see if I can create an example for you to test on cluster. |
Yes, something is going on here because travis sometimes hang, but the offending test passes ok locally. Let me see if I can spot something. |
Using current
But according to your post on another ticket this should not happen now (there should only be the DAG problem)? |
In theory the inter-locking problem should be resolved but there might be bugs that prevents jobs from quitting (e.g. a socket is closed and no longer listen to results). I am still testing... |
#1056
Right now our worker is executed like this:
The problem here is that there is potentially a large number of workers waiting for their requests to be satisfied.
This ticket proposes the use of Python coroutines/generator to resolve this problem. The technique is demonstrated in the following example:
That is to say,
yield
the request to the runner.The text was updated successfully, but these errors were encountered: