Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper clustermq parallelism #501

Merged
merged 33 commits into from
Aug 23, 2018
Merged

Proper clustermq parallelism #501

merged 33 commits into from
Aug 23, 2018

Conversation

wlandau
Copy link
Member

@wlandau wlandau commented Aug 11, 2018

Summary

It is time for proper clustermq-based job scheduling. The bulk of the work is in R/clustermq.R. Requires clustermq >= 0.8.4.00.

In make(parallelism = "clustermq", jobs = 4), a pool of 4 persistent clustermq workers spins up, and a master process deploys targets as enough workers and dependencies become available. Reprex:

# devtools::install_github("ropensci/drake", ref = "clustermq")
library(drake)
clean(destroy = TRUE) # careful of existing projects
options(clustermq.scheduler = "multicore")
make(my_plan, parallelism = "clustermq", jobs = 2)

This PR is not ready to merge. For both the multicore and SGE backends, I get the following error a lot.

Error in rzmq::poll.socket(list(private$socket), list("read"), timeout = timeout) :
  Interrupted system call

I used to think I could trace it back to w$receive_data(), but it seems to crop up unpredictably in multiple places that I have not been able to precisely identify.

cc @mschubert. Thank you again for all the help so far. Do you have ideas about where the error might be coming from?

Related GitHub issues and pull requests

Checklist

  • I have read drake's code of conduct, and I agree to follow its rules.
  • I have read the guidelines for contributing.
  • I have listed any substantial changes in the development news.
  • I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
  • I have tested this pull request locally with devtools::check()
  • This pull request is ready for review.
  • I think this pull request is ready to merge.

@codecov-io
Copy link

codecov-io commented Aug 22, 2018

Codecov Report

Merging #501 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #501    +/-   ##
======================================
  Coverage     100%   100%            
======================================
  Files          71     72     +1     
  Lines        6104   6222   +118     
======================================
+ Hits         6104   6222   +118
Impacted Files Coverage Δ
R/clustermq.R 100% <100%> (ø)
R/parallel.R 100% <100%> (ø) ⬆️
R/mclapply.R 100% <100%> (ø) ⬆️
R/parallel_ui.R 100% <100%> (ø) ⬆️
R/staged.R 100% <100%> (ø) ⬆️
R/priority_queue.R 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82ed556...9369d83. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants