Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionionally allow clustermq workers to cache the results #531

Closed
wlandau opened this issue Oct 6, 2018 · 4 comments
Closed

Optionionally allow clustermq workers to cache the results #531

wlandau opened this issue Oct 6, 2018 · 4 comments

Comments

@wlandau
Copy link
Member

wlandau commented Oct 6, 2018

@kendonB, you called it in #425 (comment).

As I've mentioned in another issue, caching on the master process is going to be slow for I/O heavy jobs.

As with the future backend, we can activate the existing caching argument for the clustermq and clustermq_staged backends: i.e.

make(plan, parallelism = "clustermq", caching = "worker", jobs = 100)
make(plan, parallelism = "clustermq_staged", caching = "worker", jobs = 100)

This enhancement could make a major difference in the tools at my workplace, and I consider it the highest priority issue for drake right now. cc @huizhang-lilly.

@wlandau
Copy link
Member Author

wlandau commented Oct 6, 2018

It would be super nice to use ZeroMQ to write the results back to the head node and write to the cache in parallel, but I am not sure exactly how. After the issues mention in mschubert/clustermq#99 get sorted out, make(plan, parallelism = "clustermq_staged", caching = "master") might approximate this somehow. But anyway, I think the current issue is a good start.

@wlandau
Copy link
Member Author

wlandau commented Oct 6, 2018

Using the 531 branch. The implementation for "clustermq" parallelism is surprisingly easy. Still needs testing on a real cluster. Stay tuned for a PR.

@wlandau
Copy link
Member Author

wlandau commented Oct 6, 2018

From #532, this issue is now solved for "clustermq" parallelism. I still want to implement it for "clustermq_staged" parallelism even though staged parallelism is almost never as good as persistent workers.

@wlandau
Copy link
Member Author

wlandau commented Oct 6, 2018

Fixed via #532 and #533. I can't believe how easy that was. I guess the work on earlier backends paid off.

@wlandau wlandau closed this as completed Oct 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant