-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutlicore does not work properly (Warning in selectChildren) #242
Comments
FWIW, that warning was introduced in R 3.5.0 as they added an internal sanity run-time check. That revealed some problem, which were fixed in R 3.5.1 patched. In other words, there was a bug in R (>= 3.5.0 & < 3.5.2) that produced this warning and all we could do was to suppress it. See futureverse/future#218 for the whole story. Now, that does not explain why it appears for you in R 3.6.1, but if you can reproduce it with plain 'parallel' code, e.g. some |
Thanks for the input @HenrikBengtsson! Up front: I think the warning is ok here. I.e., I think Based on it I checked with an old version of R (3.4.4) on another system. I think the issue is that in the newer R version the second call to > mccollect = function(jobs, timeout = 0) {
+ parallel::mccollect(jobs, wait = FALSE, timeout = timeout)
+ }
>
> job <- parallel::mcparallel(1)
> Sys.sleep(0.1)
> mccollect(job$pid)
$`2428`
[1] 1
> mccollect(job$pid)
$`2428`
NULL
> mccollect(job$pid)
NULL Here the same with the new R version: > mccollect = function(jobs, timeout = 0) {
+ parallel::mccollect(jobs, wait = FALSE, timeout = timeout)
+ }
>
> job <- parallel::mcparallel(1)
> Sys.sleep(0.1)
> mccollect(job$pid)
$`87015`
[1] 1
> mccollect(job$pid)
NULL
Warning message:
In selectChildren(jobs, timeout) :
cannot wait for child 87015 as it does not exist
> mccollect(job$pid)
NULL
Warning message:
In selectChildren(jobs, timeout) :
cannot wait for child 87015 as it does not exist It produces the warnings (which are justified here I think?) but also the second call is just It also turns out my quick fix above does not work (it already seemed like a weird fix to me). I think the second call to |
I tried a few more things. However, I can't get it working properly. With my initial approach to fix it, memory usage keeps increasing and processes do not seem to be handled correct anymore (although I am not sure why). In all cases (with my fix or not) the batchtools tests pass though. My guess is that the issue arose with a fix they did in R 3.6.0: to prevent zombie processes they added an if ((getRversion() < "3.3.2" | getRversion() >= "3.6.0") && .Platform$OS.type != "windows") {
# OLD: if (getRversion() < "3.3.2" && .Platform$OS.type != "windows") { I did not test this fix at length yet, so it might not really work well. There is also a lot of the warnings mention earlier from |
I am having some trouble with the Multicore cluster for parallel computation. It seems that once there are more jobs than CPU's
submitJobs()
keeps running but does not submit more jobs (i.e. no CPU usage). Here a mini example.Once I stop the execution, there is multiple warnings like the ones described in #221 (see below). However, in my case jobs will not be completed which makes it (at least for me) quite annoying.
I already found a fix, but I am not sure if it is a good one. I changed line 63 of 'clusterFunctionsMulticore.R' as follows.
With this change it seems to work again. I got the impression that in the original version it does not reduce the
self$jobs
table because the value forcount
seems to be originally0
and thencount + 1
is also<= 1
. Thus, it gets stuck in the repeat loop. However, with a quick search I didn't find anything in your commits that explains this (although it worked back in the day for me.). Here is some additional info on my setup.The text was updated successfully, but these errors were encountered: