-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel graph walk #531
Parallel graph walk #531
Conversation
@@ -255,6 +255,7 @@ def _launch_stack(self, stack, **kwargs): | |||
|
|||
return FailedStatus(reason) | |||
elif self.provider.is_stack_completed(provider_stack): | |||
self.provider.set_outputs(stack.fqn, provider_stack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a simple change to ensure that output lookups are parallelized. I'm not sure this is really the best implementation for this, which is why there's no comment explaining this yet.
In a nutshell, when a stack is "UPDATE_COMPLETE", this will store the outputs for the stack on the provider (which it already handles). When another stack references an output from a dependency, this ensures that there's no sequential blocking DescribeStacks calls to do the output lookups from dependencies, since it's already cached, and also makes outputs thread safe.
stacker/dag/__init__.py
Outdated
logger.debug("cancelling %s. " | ||
"Some dependencies " | ||
"were not satisfied", n) | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self. This should just continue to walk to the graph, instead of stopping. Conditional execution of the node is handled further up the call stack, which is a better place to abort.
1469d39
to
d2f3a87
Compare
d2f3a87
to
0214217
Compare
Alright, this is getting awesome. I've added a few commits:
Here's an example of two stacks requesting changes in interactive mode: And an example of ^C (SIGINT/SIGTERM): I'll get us dogfooding this internally, but I think implementation wise, this is looking pretty solid. |
97db788
to
1a81dd4
Compare
7acacbb
to
487140a
Compare
Codecov Report
@@ Coverage Diff @@
## master #531 +/- ##
==========================================
- Coverage 87.72% 87.54% -0.18%
==========================================
Files 93 94 +1
Lines 6003 6072 +69
==========================================
+ Hits 5266 5316 +50
- Misses 737 756 +19
Continue to review full report at Codecov.
|
stacker/commands/stacker/base.py
Outdated
return cancel | ||
|
||
|
||
def build_semaphore(concurrency): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be cleaner to change this to build_walker
and return an object for walking the graph (stacker.dag.walk
/stacker.dag.walk_threaded
), with the semaphore built-in.
This way, when --max-parallel=1
we can disable threaded execution entirely, which may be useful for debugging in some cases.
7c0f252
to
e7b90d0
Compare
@@ -31,6 +31,12 @@ def add_arguments(self, parser): | |||
"dependencies. Can be specified more than " | |||
"once. If not specified then stacker will " | |||
"work on all stacks in the config file.") | |||
parser.add_argument("-j", "--max-parallel", action="store", type=int, | |||
default=0, | |||
help="The maximum number of stacks to execute in " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should expand the docs on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great - a lot simpler than I thought it would end up being. Love the UI module/class. I'm good merging this - I'll let you hit merge when you feel up to it!
c1a3c99
to
680d0ca
Compare
A couple people here have hit throttling while using this. I think there's 2 things that can be done:
|
Added 2 additional commits to help with throttling:
With these two changes it should be 1) less likely that throttling is hit on DescribeStacks and 2) make stacker fallback more gracefully when throttling is hit. |
This is working well for us internally. I'm going to merge this into master and plan on doing a release candidate tomorrow to get more external people testing it. |
Parallel graph walk
Fixes #279
Closes #357
Now that the DAG has been merged, the single biggest performance improvement we can make to
stacker build
is to switch the graph walk to a parallel walk.This PR isn't quite ready to merge (but works well, I've used it internally to make changes), so I'm primarily opening this up to start talking about the implementation, and possible changes we need to make. I don't think that we should include this in the
1.2
release of stacker, so it gives us time to polish this and test it internally first.This is a multi-threaded implementation of the graph walk, which will walk the graph as fast as the graph allows. I think multi-threading is ultimately easier to do than multi-processing, since there's actually not very much we need to make thread safe within stacker itself, because of the nature of the graph. There was some talk in the past about using async io, which would be more resource efficient, but I think would complicate the implementation, but if someone wants to give that a try, be my guest.
Note that, I'm basing this branch on another branch that removes the loop logger, and moves to a simple sequential logger. It makes parallelism a lot easier to deal with.
Perf
I tested this against our internal stacker config (153 stacks) and it drops execution time from ~8 minutes to ~2 minutes. I think there's still room for a lot of optimization here.
before
after
FWIW, I have not yet run into throttling on DescribeStacks after #529 (which this PR includes) and the change to
set_outputs
.Prerequsites
TODO
--max-parallel
flag to specify the maximum allowed parallelism. This would just control a semaphore that wraps abuild
/destroy
.