-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
steps to speed up job submission? #36
Comments
Hi. It's only the first two steps;
that is handled by R and that affects how long it takes before your jobs/tasks appear on the queue (i.e. before jobs are submitted). Based on this, if you observe:
then I would say it is Step 2 that takes most of that time. I would be surprised if the static-code analysis (Step 1) for identifying code dependencies (globals and packages) would take that long - anything more than 1-2 seconds for that step would be surprising - even for very very large code bases. Instead, I suspect that you might have large global objects being "exported" in Step 2 that might cause the slowdown. Since we're utilizing batchtools here, this means that those globals are serialized to files first (part of "the batchtools registry"). Writing large objects to file can take time and if there are lots of processes writing at the same time (e.g. in a multi-tenant HPC environment), then the shared file system might be the bottleneck. Moreover, it might be the scheduler thresholds how many jobs you can submit within a certain period of time. If this is the case, I think, but not 100% sure, then batchtools will keep trying until each job submission is accepted. This could also explain the slow submission. FYI, as soon as they're on the queue, it's all up to the job scheduler to decide when they're allocated and started on a job queue. When a job is launched, then batchtools reads its registry into R, i.e. it loads required R packages and the globals into the R workers session and evaluates the future expression. When completed, the results are written back to file and eventually read by the main R session (all via the batchtools API).
The former; batchtools, and therefore future.batchtools, assumes that the R package library on the worker's machine has the required packages installed, which is typically the case because we're assuming a shared file system where packages lives. The packages are loaded just as in a regular R session, i.e. |
Forgot to clarify that from the perspective of future.batchtools, the following steps:
become:
|
It seems like there must be something more than SLURM throttling submissions going on. The plan is complete now and I don't really want to run it again unless I have to, but here were some observations:
This all seems consistent with your suggestion that some global object, which was gradually increasing in size with each submitted job, was getting serialized each time. This may actually be an issue on the I didn't set up any particular logging settings, and the .future directory now seems empty except for a |
Thanks for the additional details. If not immediately, it'll be helpful for others looking into this problem.
Note that the batchtools backend is "communicating" all data between main R process and workers via the file system. That is by design. If the file system is the bottleneck, then, yes, in-memory backends such as About "benchmark" stats etc:
|
Is it feasible to speed up the transmission of the files using the |
@kendonB, if so, then it would be something that needs to be implemented upstreams, i.e. in the batchtools package. |
@HenrikBengtsson Curious if this issue has anything to do with scheduler.latency in makeClusterFunctionsSlurm(). |
I don't know if this issue has been solved but I observed the same problem. In a clean environment, I did the following, (I've omitted the mandatory steps, like the registry, etc...) on a 10 nodes x 40 cores cluster:
Then I tried the same with
And this was abysmally slow! The same My
I'm sure there is something fundamentally simple I'm missing in all of this, but expect if it's a real bug, I couldn't find any doc on it either. Any idea? |
I've come to the conclusion that the slowness for me was because my cluster is running the Lustre file system. Lustre has its advantages, but it has a high cost per file i/o operation. This is fine when you mostly read and write a few really large files in big chunks, but it seems to slow down |
To help clarify the observed difference in time: The batchtools As @brendanf points out, if the file system on your HPC system is "slow"(*), this difference will be noticeable/significant. Long-term roadmap: I'm working on an internal redesign at the lowest level of the Future API that is likely to allow map-reduce APIs such as future.apply to produce a single batchtools registry folder without even knowing about the batchtools package. I think it can work and even be introduced without breaking backward compatibility but it will require lots of work, so it's unlikely it'll be implemented with the next 12 months. It'll be part of a bigger re-design that will make it possible to implement other feature requests. (*) It's actually really hard to produce a fast file system that can serve hundreds of users and jobs in parallel. I've seen this on parallel BeeGFS systems where lots of jobs and processes doing lots of disk I/O can slow down the performance quite a bit. It'll never be as fast as a local disk. |
Hi, However, exacerbated by large shared globals (but also due to the internals of the new backend) , the difference between 1 registry folder vs (Unrelated, |
Cool that you're getting another batchtools backend going. I'm keen to hear about it when you get it published. Yeah, sorry, no update on this (but bits and pieces here and there in the future ecosystem have been updated towards it). Happy to hear you find matrixStats useful. Cheers. |
Thanks for your amazingly fast response. One pretty common use-case I hit on is having pretty small local function state, except for one moderately large shared global. It seems a slight shame to re-compress it for every registry folder/rds export per job with using One hacky solution I've been tossing around is to just take care of loading that one variable separately. So, save that large global out into its own rds file and replace the global with just a file path to the worker function. Then in each worker, check if the variable is a file path; if so, replace it (globally) with the At the risk of maintaing a fork of the repo, I can even do this transparently to the worker function by going through the globals list when it loads in the registry, and check for a |
I'm using
future.batchtools
viadrake
, and just got my first plan running on the cluster. It seems to take about one minute for each job submitted, and since I'm trying to submit several hundred jobs, that's not ideal (although it's not a deal-breaker, because I expect each job to take many hours to finish). I'm not sure what I might be able to change in order to speed this up. I haven't dived into the code, but my idea of what needs to happen to start a worker is:Is this basically accurate?
Does the worker load libraries already installed on its node, or are all libraries sent to the worker by the master? If the latter, then reducing library dependencies seems like a potential avenue to try.
The text was updated successfully, but these errors were encountered: