Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search branch until finding something out of date in parallel #168

Closed
kendonB opened this issue Nov 23, 2017 · 10 comments
Closed

Search branch until finding something out of date in parallel #168

kendonB opened this issue Nov 23, 2017 · 10 comments

Comments

@kendonB
Copy link
Contributor

kendonB commented Nov 23, 2017

I just ran into a case where a potential parallelisation wasn't realised.

The dependency structure was something like:

A -> B -> D -> E
A -> C -> F

A, B, C, and D are up to date. Drake checked A, BC, DF, built F, then checked and built E.

I think the most efficient behavior would be:
A, B, C, and D are up to date. Drake checks A, BC, DF, E, then builds EF in parallel.

@wlandau-lilly
Copy link
Collaborator

I see your point, but I do not know how to solve this one. We would need a whole new parallel graph traversal algorithm, and then some of the most sensitive internals would need refactoring. It is a huge undertaking.

But it may be possible. Somehow, Make already knows how to do this, which means "Makefile" parallelism in drake should not have this problem. I submitted a Stack Overflow post to ask how exactly that works. In the meantime, if you can suggest a parallel graph traversal algorithm, that would help things along.

@wlandau-lilly
Copy link
Collaborator

wlandau-lilly commented Nov 24, 2017

A new parallel graph traversal algorithm will solve, accelerate, clean, and simplify a lot, including this issue. I sketched one out at https://github.com/wlandau-lilly/drake/tree/parallel_algo (see parallel_worker()). There is a fixed number of persistent workers that stay going for the entire make(). Each worker completes its current target and then searches for the next unclaimed target with completed dependencies. Issues:

  1. Race conditions. I try to make use of the "progress" storr namespace to implement locking, but sometimes I get errors or hanging jobs. I do not yet know why.
  2. Memory management. I have not decided on an alternative to prune_envir(). Could be Consider passing target arguments as promises? #167.
  3. Having the same persistent workers for the entire make() means long wall times. Users of resource managers such as SLURM may not be able to request wall times long enough, depending on the user privileges granted by their respective systems. On the other hand, reusing existing workers may remove much of the annoying overhead of starting and stopping jobs. The TORQUE example might even work on my botched local installation.
  4. "Makefile" parallelism: should it still spawn one job per target, or should it reuse the persistent workers? For the sake of the wall time issue, I lean towards the former.

@kendonB
Copy link
Contributor Author

kendonB commented Nov 24, 2017

I don't think assuming persistent workers is a problem. Users can control wall times by building subsets of myplan$targets, as I have been doing switching memory and wall time configurations for different sets of targets.

However, you might have a future ambition to build in target-level future.batchtools configurations within myplan to avoid the user having to make multiple calls to make. I wouldn't try the latter feature until the dependent packages are a bit more mature.

@kendonB
Copy link
Contributor Author

kendonB commented Nov 24, 2017

I just had a closer look a this and I think that building from your original approach is the way to go. I'm imagining running a job with hundreds of workers all fumbling to get going on the same targets. It'll be a mess. File locking surely isn't a perfect strategy.

I can't imagine that this approach can work with many workers :/

Trimming graphs of up to date targets at each parallel_stage seems like it can work and wouldn't be too complicated to implement?

@wlandau-lilly
Copy link
Collaborator

wlandau-lilly commented Nov 24, 2017

I hope you are right. It is definitely easier to stick with parallel stages and one job per target, and in the short term, it is much safer. Plus, with a fixed set of permanent workers, your idea of target-level hpc configurations would be impossible.

I just feel bad that drake's graph traversal algorithm is less aggressive than that of Make itself. With lapply()-like backends, drake needs to finish all the jobs in a stage before starting any of the next targets. Make, on the other hand, can just fork new jobs on the fly whenever a worker finishes and the graph opens up. It may be that drake's non-"Makefile" backends are just more constrained than Make and there is nothing we can do about it. But it could also be that I just naively dove into the first algorithm that came to mind without actively searching for enough other possibilities. Maybe future without future_lapply() could be more aggressive, I am not sure.

So we can stick to the original intent of this issue. My company's hpc systems will be down this weekend, and I am spending most of my time with family right now anyway, so my own work will be more delayed than usual.

@kendonB
Copy link
Contributor Author

kendonB commented Nov 24, 2017

future without future_lapply can likely achieve aggressive parallelism like you describe. I'm still wrapping my head around that package, but it seems like this is exactly the use case they were thinking of. However, many users (at least me) would want to be able to switch plans between sets of targets. All the best to you and your family over the holiday period, Will! Stop working!

wlandau-lilly added a commit that referenced this issue Nov 24, 2017
wlandau-lilly added a commit that referenced this issue Nov 25, 2017
This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
@wlandau-lilly
Copy link
Collaborator

Thanks for the encouragement, Kendon! I am taking a lot of time off.

Part of the solution to this issue will be to always process all the targets before all the imports, no matter what parallel backend is used. This will simplify, fortify, and standardize the parallel execution routines, and it solves a similar problem: for local parallelism, sometimes imported functions are lined up with true targets, which detracts from the parallelism of the targets themselves.

Also, this issue requires a deep update to max_useful_jobs() and rate_limiting_times(), which eventually led me to #170.

@kendonB
Copy link
Contributor Author

kendonB commented Nov 25, 2017 via email

@wlandau-lilly
Copy link
Collaborator

Absolutely. In fact, unnecessary checking is removed elsewhere so drake relies more heavily on last-minute checking in each stage.

@wlandau-lilly
Copy link
Collaborator

Fixed, unit tested, and mentioned in the vignettes and README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants