-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work hangs: mclapply() parallelism within clustermq jobs #103
Comments
Can you try this with Q(..., template=list(log_file="...")) |
options(
clustermq.scheduler = "sge",
clustermq.template = "sge_clustermq.tmpl"
)
library(clustermq)
f <- function(i){
parallel::mclapply(1:4 + i, sqrt, mc.cores = 4)
}
Q(f, 1:8, n_jobs = 8, template = list(log_file = "log.txt"))
#> Submitting 8 worker jobs (ID: 6424) ...
#> Running 8 calculations (1 calls/chunk) ...
#> [===============================>--------------------] 62% (4/4 wrk) eta: 6s At this point, the work hung, so I sent SIGINT with CTRL-C. ^CError in rz mq::poll.socket(list(private$socket), list("read"), timeout = msec) :
The operation was interrupted by delivery of a signal before any events were available.
Calls: Q ... master -> <Anonymous> -> <Anonymous> -> <Anonymous>
^CExecution halted Log file: > clustermq:::worker("tcp://CLUSTER-LOGIN-NODE:6424")
Master: tcp://CLUSTER-LOGIN-NODE:6424
WORKER_UP to: tcp://CLUSTER-LOGIN-NODE:6424
> DO_SETUP (0.000s wait)
token from msg: ubust
> WORKER_STOP (0.000s wait)
shutting down worker
Total: 0 in 0.00s [user], 0.00s [system], 0.01s [elapsed] |
Thank you, I could reproduce this now: it was caused by |
mschubert
added a commit
that referenced
this issue
Aug 31, 2018
Fixed on my end. Thanks very much. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I suspect this is related to #99, but it is an important use case, so I thought I should post something for the record. Feel free to close if you think R-devel already fixed it.
The following little
drake
pipeline sends jobs to an SGE cluster, and each job usesmclapply()
to parallelize its own work. It hangs whenmc.cores
is greater than 1, and it completes normally (and very quickly) whenmc.cores
equals 1. I am using ropensci/drake@c6395ee, and ecfdb9d. Other session info is here.The template file makes sure each job gets 4 cores.
We get pretty far along in the workflow, but it hangs before starting
x_8
.qstat
shows that some, but not all, of the workers are still running.The text was updated successfully, but these errors were encountered: