-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search branch until finding something out of date in parallel #168
Comments
I see your point, but I do not know how to solve this one. We would need a whole new parallel graph traversal algorithm, and then some of the most sensitive internals would need refactoring. It is a huge undertaking. But it may be possible. Somehow, Make already knows how to do this, which means |
A new parallel graph traversal algorithm will solve, accelerate, clean, and simplify a lot, including this issue. I sketched one out at https://github.com/wlandau-lilly/drake/tree/parallel_algo (see parallel_worker()). There is a fixed number of persistent workers that stay going for the entire
|
I don't think assuming persistent workers is a problem. Users can control wall times by building subsets of myplan$targets, as I have been doing switching memory and wall time configurations for different sets of targets. However, you might have a future ambition to build in target-level future.batchtools configurations within myplan to avoid the user having to make multiple calls to make. I wouldn't try the latter feature until the dependent packages are a bit more mature. |
I just had a closer look a this and I think that building from your original approach is the way to go. I'm imagining running a job with hundreds of workers all fumbling to get going on the same targets. It'll be a mess. File locking surely isn't a perfect strategy. I can't imagine that this approach can work with many workers :/ Trimming graphs of up to date targets at each parallel_stage seems like it can work and wouldn't be too complicated to implement? |
I hope you are right. It is definitely easier to stick with parallel stages and one job per target, and in the short term, it is much safer. Plus, with a fixed set of permanent workers, your idea of target-level hpc configurations would be impossible. I just feel bad that So we can stick to the original intent of this issue. My company's hpc systems will be down this weekend, and I am spending most of my time with family right now anyway, so my own work will be more delayed than usual. |
|
This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
Thanks for the encouragement, Kendon! I am taking a lot of time off. Part of the solution to this issue will be to always process all the targets before all the imports, no matter what parallel backend is used. This will simplify, fortify, and standardize the parallel execution routines, and it solves a similar problem: for local parallelism, sometimes imported functions are lined up with true targets, which detracts from the parallelism of the targets themselves. Also, this issue requires a deep update to |
Will this still allow drake to check targets at each parallel stage? You
would want to check targets right before they get built in case rebuilding
a dependency doesn't actually change that dependency.
|
Absolutely. In fact, unnecessary checking is removed elsewhere so |
Fixed, unit tested, and mentioned in the vignettes and README. |
I just ran into a case where a potential parallelisation wasn't realised.
The dependency structure was something like:
A -> B -> D -> E
A -> C -> F
A, B, C, and D are up to date. Drake checked A, BC, DF, built F, then checked and built E.
I think the most efficient behavior would be:
A, B, C, and D are up to date. Drake checks A, BC, DF, E, then builds EF in parallel.
The text was updated successfully, but these errors were encountered: