Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support clustermq as backend for drake #86

Closed
wlandau opened this issue Jun 27, 2018 · 46 comments
Closed

Support clustermq as backend for drake #86

wlandau opened this issue Jun 27, 2018 · 46 comments

Comments

@wlandau
Copy link
Contributor

wlandau commented Jun 27, 2018

First of all, I am super excited about this package! I just tried it out, and it automatically detected my company's SGE scheduler and submitted jobs in a fraction of the time I am used to. Way cool!

I am thinking about a couple of clustermq backends for drake: one for persistent workers, and another for transient workers. The former is easiest to implement, but I do not expect to see performance gains there because persistent workers need to use the file system to save targets on the go.

However, clustermq-based transient workers could really shine, especially when compared to drake's current future/batchtools-based transient workers (make(parallelism = "future"), which add noticeable overhead to every single target. But to implement transient workers with clustermq, I believe I will need a way to submit and monitor workers in a non-blocking way. With future, for example, the master process can submit a job, go check on other jobs, and come back and check if the first job finished.

I suppose one potential alternative is to take advantage of the native pipeline functionality in the schedulers themselves. Are you considering this for clustermq?

@mschubert
Copy link
Owner

Thank you for your kind words, and please excuse my delay in getting back to you: this question is a bit harder to answer than the others.

We have discussed a non-blocking version of Q before in #23.

For drake support specifically:

One option would be to implement clustermq as a backend for future, which would give us drake support for free. There is an issue on future for the backend, but I'm afraid this is currently not high enough on my priority list.

Another option would be to use parallel foreach as a parallelism backend for drake. I don't know if this is possible already, or how hard it would be to implement. foreach support in clustermq is experimental and should be trialed from the next release.

In either case, I think this would use persistent (and reusable) workers and not actually need non-blocking Q, because each loop will be blocking but still be processed in parallel.

@wlandau
Copy link
Contributor Author

wlandau commented Jun 28, 2018

Thanks for explaining, Michael. I think HenrikBengtsson/future#204 would be ideal, and I am optimistic about its eventual implementation. I will close this issue as a duplicate of #23.

@mschubert
Copy link
Owner

I am reopening this for drake support specifically

@mschubert mschubert reopened this Jun 28, 2018
@mschubert mschubert changed the title Consider a non-blocking version of Q() Support clustermq as backend for drake Jun 28, 2018
@wlandau
Copy link
Contributor Author

wlandau commented Jun 28, 2018

And I appreciate it. Really glad you're willing to help me speed up drake.

@mschubert
Copy link
Owner

Let me expand on the previous point a bit:

(1) I'm pretty sure you do not want transient workers for drake on the cluster (as in, a worker per call)

Many of your function calls may be short. Submitting many short-running jobs in a short period of time puts stress on the scheduler. The lower limit that is usually quoted is 2 minutes, and snakemake was scolded by sysadmins for using this exact approach (that they later tried to mend using job groups)

(2) There are usually batches of function calls where some arguments are iterated but others are not

Often, function calls will work on the same data but e.g. different indices are supplied. In this case, you will not want the common part to be sent over the network for each call separately.

I think the way to go here is to:

  • Let drake define a pool of workers that will be common to the workflow (I think you already do this, but also allow for a different number of workers during those steps)
  • Group tasks per function to call, organise iterated and constant arguments; if there are parallel chains, create a temporary function for the chain (I think you already do this or something similar)
  • Let clustermq work on each of these groups one after another (this is where the parallel foreach or the future backend comes in)

Q in this case would be handled like parLapply, which is blocking itself but processes tasks in parallel.

The downside of course is that each group of function calls needs to be completed before the next can start. As the calls themselves are load-balanced, I don't think this is a big deal.

@wlandau
Copy link
Contributor Author

wlandau commented Jun 29, 2018

You bring up a good point re transient workers. I do eventually want to get to job groups, but it is much harder to implement than persistent and transient workers. All in good time.

I do like the idea of maintaining a pool of common workers and submitting jobs to them as targets become ready (i.e. when their dependencies are all checked and built). The only downside is that users may hit wall time limits fairly easily, but we can work around that later with job groups (or simply by refreshing workers periodically).

Unfortunately, it is not so simple to group tasks by function call. Users define arbitrary dependency networks of commands, and those commands are arbitrary chunks of R code (not necessarily neat function calls). I suppose some targets sometimes share all the same dependencies, but I do not think such groupings would efficiently generalize to larger workflows.

The downside of course is that each group of function calls needs to be completed before the next can start.

At first, I did think this was such a big deal. What you describe is essentially what I have been calling "staged scheduling", and it was a good start. However, during its first several months of use, there were several projects that lost considerable parallel efficiency. drake could not always jump ahead in the graph even when targets were ready. ropensci/drake#168 helped somewhat, but it was not nearly sufficient. I touched on the downsides of staged parallelism in a recent rOpenSci tech note, and drake no longer supports it for remote workers.

The most successful solutions so far have been

  1. Totally transient workers with the "future" and "Makefile" backends, and
  2. Totally persistent workers with the "mclapply", "parLapply", and "future_lapply" backends. Here, the workers cannot and do not block the master process that sends them instructions. I have found this asynchronicity to be essential for traversing the dependency graph in parallel as efficiently as possible. Whether through future or clustermq itself, I hope we can make it happen.

@wlandau
Copy link
Contributor Author

wlandau commented Jun 29, 2018

I did try to use clustermq workers just like the current lapply-like persistent worker backends, but I found it defeated the purpose. Currently, drake uses the file system to communicate with its persistent workers, and those workers use the file system to cache the targets they build. While this approach produces the correct results (I took special precautions to avoid race conditions due to NFS file latency) the overhead is large. That's why I hope instructions and targets can be transmitted with ZeroMQ and the master process can take total responsibility for the file system.

@mschubert
Copy link
Owner

I'm beginning to see the issues you are facing. It should be possible to use clustermq workers for individual function calls, while not using the main event loop.

This would be to use something like:

w = workers(2) # submit 2 worker jobs for testing
on.exit(w$cleanup())

msg = w$receive_data()
if (!is.null(msg$result))
    # handle result ...

# if there is work
# this still needs some adjustments on cmq for this to work
w$set_common_data(data=list(fun, const, export, seed)) # set function to call, constant args
w$send_common_data()
w$send_job_data(msg=list(id="do_chunk", df)) # df is data.frame with row=id and fun args

# if not
w$disconnect_worker(msg)

Processing the calls this way would be asynchronous, with the drake parallel backend responsible for the event loop.

Is that something you could work with?

@wlandau
Copy link
Contributor Author

wlandau commented Jul 4, 2018

Thanks for meeting me half way! I think I get the general idea, and I want to understand more about the details.

From what I understand, the master process will loop over something like the above code to receive worker data and send jobs to workers.

After msg <- w$receive_data(), what are the contents of msg? Is it the output from the worker that finished soonest, results from all the workers, or something else?

In w$send_job_data(), would I be sending a single job to the workers collectively? Will clustermq's load balancing decide which worker gets assigned?

@mschubert
Copy link
Owner

mschubert commented Jul 5, 2018

I've written about the specification here, hope that helps. Please note the proposed drake-specific extensions DO_CALL, which is DO_SETUP and DO_CHUNK in one command (otherwise there's no guarantee you're talking to the same worker).

From what I understand, the master process will loop over something like the above code to receive worker data and send jobs to workers.

Yes

After msg <- w$receive_data(), what are the contents of msg? Is it the output from the worker that finished soonest, results from all the workers, or something else?

It will be the result of a single chunk (in your case, likely a function call or expression on a single set of arguments).

The fields of msg will be: id, result, warnings, and errors (as documented above)

In w$send_job_data(), would I be sending a single job to the workers collectively? Will clustermq's load balancing decide which worker gets assigned?

This would be a single call to a single worker. You'd loop through the following way:

switch(msg$id,
    "WORKER_UP" = {
        if (there is calls left that you can do right now)
            w$do_call(...)
        if (there is a result in msg$id)
            # handle result here
    }
    "WORKER_ERROR" = {
        # something went wrong, clean up and print error
    }
)

Calls will be load-balanced, but you need to check the result ID to match the result to a call you sent. Otherwise, you'll have no idea which worker sent which result.

@wlandau
Copy link
Contributor Author

wlandau commented Jul 7, 2018

Thanks for elaborating. I will experiment with this and get back to you.

@wlandau
Copy link
Contributor Author

wlandau commented Jul 18, 2018

Michael, thank you for your patience. I know I was slow getting back to you. I wanted to make sure I had a good quality block of time to devote to this.

I've written about the specification here, hope that helps. Please note the proposed drake-specific extensions DO_CALL, which is DO_SETUP and DO_CHUNK in one command (otherwise there's no guarantee you're talking to the same worker).

The spec helped a lot, and I do see the necessity of DO_CALL. Thank you.

Regarding your follow-up code from #86 (comment): do you plan to implement w$do_call(...) as part of the DO_CALL message type? Maybe shorthand for w$send_job_data(msg = list(id = "DO_CALL")?

I studied your code from #86 (comment), and the results of my experimentation are below, along with some embedded questions and comments. I am optimistic about interacting with clustermq this way, and I have decided to seriously pursue it for drake::make(parallelism = "clustermq"). I realize we still need DO_CALL, and I would also appreciate your help teaching me the rest.

library(clustermq)
#> * Option 'clustermq.scheduler' not set, defaulting to SGE
#> --- see: https://github.com/mschubert/clustermq/wiki#setting-up-the-scheduler
w <- workers(2)
#> Submitting 2 worker jobs (ID: 7628) ...
on.exit(w$finalize()) # See #91.
msg <- w$receive_data()
msg
#> $id
#> [1] "WORKER_UP"
#>
#> $pkgver
#> [1] 0.8.4.1
if (is.null(msg)){
  # Under what circumstances is the message NULL?
  # What should I check if it is?
}
# From https://github.com/mschubert/clustermq/blob/3c7fda0a7c6ddcc7f9b7ac557a6a2abfcc917e16/R/Q_rows.r#L24-L25
data <- list(
  fun = function(x){
    x * 2
  },
  const = list(),
  export = list(),
  rettype = "list",
  common_seed = 123
)
# From https://github.com/mschubert/clustermq/blob/3c7fda0a7c6ddcc7f9b7ac557a6a2abfcc917e16/R/Q_rows.r#L41
# Do we need `id = "DO_SETUP"` somewhere?
do.call(w$set_common_data, data)
w$send_common_data()
w$send_job_data(
  msg = list(
    id = "DO_CHUNK",       # Maybe eventually "DO_CALL"?
    df = data.frame(x = 1) # drake will deploy targets 1 at a time.
  )
)
#> Operation cannot be accomplished in current state
msg <- w$receive_data()
msg
#> $id
#> [1] "WORKER_ERROR"
#>
#> $token
#> [1] "agasx"
w$disconnect_worker(msg)

It seems I do not know how to set the common data properly. In a previous attempt, I tried to send data more like you did in #86 (comment), but I get a worker error there too.

w$set_common_data(data)
w$send_common_data()
w$send_job_data(
  msg = list(
    id = "DO_CHUNK",       # Maybe eventually "DO_CALL"?
    df = data.frame(x = 1) # drake will deploy targets 1 at a time.
  )
)
#> Operation cannot be accomplished in current state
msg <- w$receive_data()
msg
#> $id
#> [1] "WORKER_ERROR"
#>
#> $msg
#> [1] "wrong field names for DO_SETUP: data"

Like you said, maybe clustermq needs more development before this will work. How far can I get with the current development version?

mschubert added a commit that referenced this issue Aug 9, 2018
@mschubert
Copy link
Owner

@wlandau Can you try the following using develop?

w = workers(1)

# worker connecting to master with id=WORKER_UP
w$receive_data()

# you should be able to send arbitrary expressions to evaluate on a worker
w$send_call(x*2, list(x=5))

# receive the result from any worker with id=WORKER_READY
w$receive_data()

w$cleanup()

@wlandau
Copy link
Contributor Author

wlandau commented Aug 9, 2018

It worked exactly as I guessed.

> options(clustermq.scheduler = "sge", clustermq.template = "template.tmpl")
> library(clustermq)
> w = workers(1)
Submitting 1 worker jobs (ID: 7225) ...
> # worker connecting to master with id=WORKER_UP
> w$receive_data()
$id
[1] "WORKER_UP"

$pkgver
[1] ‘0.8.4.99> # you should be able to send arbitrary expressions to evaluate on a worker
> w$send_call(x*2, list(x=5))
> # receive the result from any worker with id=WORKER_READY
> w$receive_data()
$id
[1] "WORKER_READY"

$token
[1] NA

$expr
x * 2

$result
[1] 10

> w$cleanup()

And with a low enough timeout, it looks like I can keep calling w$receive_data() until a worker is ready to send a result. Multiple calls are giving me a little trouble at the moment, though (maybe I am getting a little too far ahead here).

> options(clustermq.scheduler = "sge", clustermq.template = "template.tmpl")
> library(clustermq)
> 
> w = workers(2)
Submitting 2 worker jobs (ID: 7321) ...
> 
> # worker connecting to master with id=WORKER_UP
> w$receive_data()
$id
[1] "WORKER_UP"

$pkgver
[1] ‘0.8.4.99> w$receive_data(timeout = 1e-9)
NULL
> 
> # you should be able to send arbitrary expressions to evaluate on a worker
> w$send_call(x*2, list(x=5))
> w$send_call(x*2, list(x=5))
Operation cannot be accomplished in current state
> 
> # receive the result from any worker with id=WORKER_READY
> w$receive_data()
$id
[1] "WORKER_READY"

$token
[1] NA

$expr
x * 2

$result
[1] 10

> 
> w$cleanup()
Master: [2.3s 1.5% CPU]; Worker: [avg NA% CPU, max NA Mb]

@mschubert
Copy link
Owner

mschubert commented Aug 9, 2018

For your 2nd example you still need an event loop ;-)

while(work remaining or worker still running) {
    msg = w$receive_data() # wait until a worker is ready
    if (msg$id == "WORKER_READY") {
        if (work remaining)
            w$send_call(next work) # send it work if it is
        else
            w$send_shutdown_worker()
        # ...handle your result...
    } else
        # w$disconnect_worker() if WORKER_DONE, stop if WORKER_ERROR
}

The key here is that receive and send are called alternately, so that ZeroMQ can distribute the work in the background.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 10, 2018

I see. This is beginning to take shape.

Also, drake keeps track of a large configuration list internally (from drake_config()). If possible, I would like to send it to the workers just once at the beginning rather than every send_call().

Does this look sort of right to you?

w$send_job_data(config)
while(work remains or worker still running) {
  msg = w$receive_data()
  if (identical(msg$id, "WORKER_READY")) {
    store_result(msg)
    target <- choose_next_target(msg)
    if (!is.null(target)){ # There is work to be done.
      # Load the dependencies the the next worker needs to have in memory
      # and remove unnecessary objects
      # (depending on the user-defined memory strategy)
      prune_envir(config) # https://github.com/ropensci/drake/blob/16700f77700729511db103610f55c23978d7fef7/R/envir.R#L28
      # Build the target.
      w$send_call(
        build_target,
        list(target = target),
        const = list(dependencies = get_dependencies(config$envir))
      )
    }
  } else if (identical(msg$id, "WORKER_DONE") {
    w$disconnect_worker()
  } else if (identical(msg$id, "WORKER_ERROR") {
    stop("worker error")
  }
}

Eventually, I would like to be more selective about the data that gets sent to each worker. Many targets share dependencies, and depending what the assigned worker built previously, we could be repeating ourselves if we send all the dependency data over the socket. Of course, because of clustermq's automatic load balancing, we may not be able to identify the next worker beforehand to figure out what it already has in memory.

@mschubert
Copy link
Owner

Looks good!

Some minor pointers:

If you want to send the config separately in the beginning, use w$set_common_data with your config object in export and dummy fields for fun, const, rettype, and common_seed.1 In your loop, you'd check for WORKER_UP (respond with w$send_job_data), and WORKER_READY (do w$send_call) as you do now.

w$send_call takes an expression (expr) and all dependencies in a list (env); There is no const argument, because for a single call it doesn't make sense to distinguish between constant and iterated.2


1 I know requiring dummy fields is not ideal, and this is still subject to change
2 I'm thinking about generalizing token to refer to compute environments, so that workers can perform operations on multiple sets of available data (that were previously exported to that worker); this way you could also track what objects they have in memory

@wlandau
Copy link
Contributor Author

wlandau commented Aug 10, 2018

I am not sure I understand your comments about WORKER_UP and w$send_job_data(), especially since there is no DO_CHUNK message in section of the message specification on evaluating custom expressions. What is the role of w$send_job_data() here? If w$send_call() already has an env argument, what additional value does w$send_job_data() provide?

Also, do we need a w$send_common_data() after w$set_common_data()? Do we need calls to receive_data() after each?

Overall, I am having trouble understanding the functions in the QSys class and when to use them. I think it would be helpful to have a specification to distinguish among them and walk through how they work together practice.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 11, 2018

Update: the new backend has progressed enough for a PR: ropensci/drake#501. The only thing left is an unpredictable yet frequent error:

Error in rzmq::poll.socket(list(private$socket), list("read"), timeout = timeout) :
  Interrupted system call

Any ideas about what might be happening? I am running the reprex from #86 (comment) using ropensci/drake@8f84aa7.

@mschubert
Copy link
Owner

I can process jobs=1 for the mtcars example without problems in 3.1 seconds. I see the same problem as you with jobs=2 (which requires me to install the txtq package, so there is some different handling of 2 vs 1 jobs).

rzmq::poll.socket will raise an error if an interrupt is received. Question is, what is sending the interrupt, and why? (it's not rzmq or clustermq)

Note that we had issues with terminal size changes sending interrupts before.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 12, 2018

txtq is not actually used for the work that drake does with clustermq. It is required for drake's mclapply backend, which is used by default to process the imports. make(jobs = c(imports = 1, targets = 2) and make(parallelism = c(imports = "mclapply_staged", targets = "clustermq") do not require txtq.

ropensci/rzmq#37 does provide context, thanks.

I am not sure what could be sending the interruption, but since the error message mentions private$socket, it seems like the interrupt happens around the call to receive_data(). I would like to be able to just try receive_data() again if there is an interruption, but try()/tryCatch() statements are ineffective, presumably because of the asynchronous nature of signal interruptions.

@mschubert
Copy link
Owner

mschubert commented Aug 12, 2018

Yes, it's no surprise that you catch the interrupts with receive_data() because poll.socket() is what reacts to them (and you're spending most of your time there).

Using strace, I get:

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=160983,
si_uid=10282396, si_status=SIGTERM, si_utime=1, si_stime=1} ---

Are you running any multicore operations in the background?

@wlandau
Copy link
Contributor Author

wlandau commented Aug 13, 2018

I was running some occasional mclapply() calls to manage dependencies in the execution environments, but no longer (ropensci/drake@7228b3b). Even afterwards, I still see that SIGCHLD/CLD_KILLED message from strace. Incidentally, this reminds me of the SIGCHLD/parallel issues callr is grappling with right now.

By the way, I am not sure this is relevant, but in the current loop on the master process, I never see a WORKER_DONE message. Just one WORKER_UP message for each worker at the beginning and then WORKER_READY messages from then on.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 13, 2018

Regarding WORKER_DONE, I realize that part is my fault. The relevant section of the loop is never reached, and I have been relying on w$cleanup().

In your pseudocode from #86 (comment), the while() loop checks if a worker is still running. What is the correct way to do that with a QSys object?

@wlandau
Copy link
Contributor Author

wlandau commented Aug 13, 2018

Never mind: I saw you use w$workers_running internally, and it seems to work for drake as well. As of ropensci/drake@026cfb3, the "Interrupted system call" errors seem to have disappeared completely when I submit to SGE. The multicore backend still shows them when I try am using at least 4 workers at a time, but only when drake starts shutting down workers.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 14, 2018

bb4477e seemed to help, but drake's clustermq tests give several warnings

test-clustermq.R:16: warning: clustermq parallelism
cannot wait for child 14833 as it does not exist

Then, sometimes the task hangs, and other times the infamous "Interrupted system call" error appears.

@wlandau wlandau closed this as completed Aug 14, 2018
@wlandau wlandau reopened this Aug 14, 2018
@wlandau
Copy link
Contributor Author

wlandau commented Aug 14, 2018

However, jobs on SGE run great, and real HPC schedulers should be drake's main focus when it comes to clustermq. drake already has several local parallel backends, so its only use for multicore clustermq is for unit testing. Is there another backend that could allow drake to just go through the motions locally? I would rather not merge ropensci/drake#501 unless there are proper unit tests that cover all the important parts of the code.

@mschubert
Copy link
Owner

Please note that the latest version on develop no longer sends WORKER_UP, but WORKER_READY instead.

This will cause problems for drake/clustermq.R#L34; you will need to check token of the ready message to be the same as you set in common data.

Sorry for this, but that's the downside of using the bleeding edge version. I might still simplify the shutdown procedure before a release (eliminating the requirement for WORKER_DONE checks).

@wlandau
Copy link
Contributor Author

wlandau commented Aug 21, 2018

Thanks for letting me know. Nothing to apologize for, I am excited to see clustermq develop.

In the new develop branch, would the workflow be something like this?

cmq_set_common_data <- function(config){
  export <- list()
  if (identical(config$envir, globalenv())){
    export <- as.list(config$envir, all.names = TRUE) # nocov
  }
  export$config <- config
  config$workers$set_common_data(
    export = export,
    fun = identity,
    const = list(),
    rettype = list(),
    common_seed = config$seed,
    token = "set_common_data_token"
  )
}

cmq_master <- function(config){
  while (cmq_work_remains(config) || config$workers$workers_running > 0){
    msg <- config$workers$receive_data()
    cmq_conclude_build(msg = msg, config = config)
    if (identical(msg$id, "WORKER_READY")) {
      if (identical(msg$token, "set_common_data_token")){
        config$workers$send_common_data()
      } else if (cmq_work_remains(config)){
        cmq_send_target(config)
      } else {
        config$workers$send_shutdown_worker()
      }
    } else if (identical(msg$id, "WORKER_DONE")) {
      config$workers$disconnect_worker(msg)
    } else if (identical(msg$id, "WORKER_ERROR")) {
      stop("clustermq worker error") # nocov
    }
  }
}

@wlandau
Copy link
Contributor Author

wlandau commented Aug 21, 2018

Also, if/when you remove the requirement to check WORKER_DONE, what would the cleanup procedure look like? I am picturing something like the following.

cmq_master <- function(config){
  on.exit(config$workers$cleanup()) # Replaces WORKER_DONE checks?
  while (cmq_work_remains(config) || config$workers$workers_running > 0){
    msg <- config$workers$receive_data()
    cmq_conclude_build(msg = msg, config = config)
    if (identical(msg$id, "WORKER_READY")) {
      if (identical(msg$token, "set_common_data_token")){
        config$workers$send_common_data()
      } else if (cmq_work_remains(config)){
        cmq_send_target(config)
      } else {
        config$workers$send_shutdown_worker() # Do we still need this?
      }
    } else if (identical(msg$id, "WORKER_ERROR")) {
      stop("clustermq worker error") # nocov
    }
  }
}

@mschubert
Copy link
Owner

The idea (implemented in clustermq@wapi) is to handle default events within workers$receive_data() and separate the worker API from the message identifiers:

cmq_master <- function(config){
  on.exit(config$workers$finalize())
  while (not all results are in){
    msg <- config$workers$receive_data()
    cmq_conclude_build(msg = msg, config = config)
    if (!identical(msg$token, "set_common_data_token")){
      config$workers$send_common_data()
    } else if (cmq_work_remains(config)){
      cmq_send_target(config)
    } else {
      config$workers$send_shutdown_worker()
    }
  }
  if (config$workers$cleanup())
    on.exit()
}

The downside I can see is that if you enter the loop after all results are finished, you will get a WORKER_DONE message ID, which you usually don't have to handle (and hence may confuse API users if they don't set their exit condition properly)

@wlandau
Copy link
Contributor Author

wlandau commented Aug 21, 2018

I do like having a worker API that does not require checking message types most of the time. I have begun the adjustment in ropensci/drake@8ee84b0. Currently, drake hangs indefinitely here at receive_data() after the first send_common_data().

# devtools::install_github("ropensci/drake", ref = "clustermq")
library(drake)
load_mtcars_example()
clean(destroy = TRUE)
options(clustermq.scheduler = "multicore")
make(my_plan, parallelism = "clustermq", jobs = 2, verbose = 4)

@mschubert
Copy link
Owner

You still seem to be using

while (cmq_work_remains(config) || config$workers$workers_running > 0){

and not

while (not all results are in){

The new API requires that you exit your main loop when the last result is in, not when the last worker is shut down (this is handled by cleanup())

I've added (2a4c3bf) the errors:

Error in config$workers$receive_data() :
  Trying to receive data without workers

and

Error in config$workers$receive_data() :
  Trying to receive data after work finished

to tell the user when they attempt to do that.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 22, 2018

Thanks. In ropensci/drake@0489ccf, the main loop now checks if all the results are in.

I just tried #86 (comment) again using ropensci/drake@0489ccf and 6af6dff, and I see the following.

  1. The first receive_data() succeeds and returns a WORKER_READY message.
[1] "receiveing"
$id
[1] "WORKER_READY"

$pkgver
[1] 0.8.4.99

$token
[1] "not set"
  1. The first set_common_data() runs without error.
  2. The second receive_data() quits with the new "Trying to receive data without workers" error.

The results are the same whether I use 1, 2, or 8 workers.

@mschubert
Copy link
Owner

The call to cleanup should go outside the loop (see #86 (comment)).

Now you shut down your workers after the first receive_data(), that's why you don't have workers ;-)

@wlandau
Copy link
Contributor Author

wlandau commented Aug 22, 2018

Wow, I can't believe I missed that! Thanks!

@wlandau
Copy link
Contributor Author

wlandau commented Aug 22, 2018

Initial testing is looking so much better! #99 aside, my last concern is that I am seeing some (but not all) of my SGE jobs still running even after cleanup. I submitted more workers than targets that time, so I wonder if these zombie workers never fully started up to begin with.

Anyway, is the new worker API likely to change significantly going forward? If not, we are close being ready to merge ropensci/drake#501.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 23, 2018

drake::make(parallelism = "clustermq") and drake::make(parallelism = "clustermq_staged") appear to work beautifully with the clustermq develop branch. I sincerely appreciate your hard work on refactoring and generous help bringing me along. High-performance computing is one of drake's major promises, and thanks to you, it can really deliver.

@wlandau wlandau closed this as completed Aug 23, 2018
@mschubert
Copy link
Owner

Good to see that it works & happy I could contribute a bit!

I still aim to solve #99 and have the rzmq patch merged before a CRAN release, so I estimate about 10 days for 0.8.5.

@wlandau
Copy link
Contributor Author

wlandau commented Aug 24, 2018

Fantastic! I will keep an eye out for the next CRAN release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants