-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make zone load balancing logic depend on local_cluster host distribution. #174
Conversation
|
||
// If number of hosts in a local zone big enough then route all requests to the same zone. | ||
if (local_zone_healthy_hosts.size() * number_of_zones >= host_set_.healthyHosts().size()) { | ||
std::vector<uint64_t> local_percentage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a stack allocated array with the right size, rework the functions above to take the array and populate it if you want to do that. You can't allocate things like vectors in this code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just do the pre-calculation work now. I would actually prefer you just do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i made stack alloc for now, i want make sure things works good in prod
// At this point we should route cross zone as we cannot route to the local zone. | ||
stats_.zone_routing_no_sampled_.inc(); | ||
|
||
std::vector<uint64_t> capacity_left; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack allocated array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
stats_.zone_routing_no_sampled_.inc(); | ||
|
||
std::vector<uint64_t> capacity_left; | ||
// Local zone does not have additional capacity (we already routed what we could), but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments. Don't understand what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
…ts." This reverts commit ab35059. Conflicts: test/common/upstream/load_balancer_simulation_test.cc
if (total_hosts != 0) { | ||
size_t i = 0; | ||
for (const auto& zone_hosts : hosts_per_zone) { | ||
ret[i++] = 10000ULL * zone_hosts.size() / total_hosts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment explaining the 10,000ULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, good point, will add more details into load_balancer_impl.h. Done.
// Now we need to figure out how much traffic we can route cross zone and to which exact zone | ||
// we should route. Percentage of requests routed cross zone to a specific zone should be | ||
// proportional to the residual capacity upstream zone has. | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -192,10 +192,12 @@ class HostSet { | |||
COUNTER(update_success) \ | |||
COUNTER(update_failure) \ | |||
COUNTER(zone_cluster_too_small) \ | |||
COUNTER(zone_over_percentage) \ | |||
COUNTER(zone_routing_all_directly) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you prefix all of the lb ones with "lb_" and then reorder this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, will do for consistency with panic threshold stats
stats_.zone_over_percentage_.inc(); | ||
return local_zone_healthy_hosts; | ||
uint64_t local_percentage[number_of_zones]; | ||
calculateZonePercentage(local_host_set_->healthyHostsPerZone(), local_percentage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT here also that local_host_set_->healthyHostsPerZone().size() == host_set_.healthyHostsPerZone().size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
* In this case we'll route requests to hosts no matter if they are healthy or not. | ||
*/ | ||
bool isGlobalPanic(const HostSet& host_set); | ||
const std::vector<HostPtr>& tryChooseLocalZoneHosts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add comment or newline before this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put comments.
// Local zone (index 0) does not have residual capacity as we have routed all we could. | ||
residual_capacity[0] = 0; | ||
for (size_t i = 1; i < number_of_zones; ++i) { | ||
// Only route to the zones that have additional capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still having trouble quickly understanding this logic. More comments, and potentially an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments added
// Not zone routed requests will be distributed between all hosts and hence | ||
// we need to route only fraction of req_percent_to_route to the local zone. | ||
double actual_routing_ratio = (ratio_to_route - zone_host_ratio) / (1 - zone_host_ratio); | ||
// Random sampling to select specific zone for cross zone traffic based on the additional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments. What type of number does residual_capacity[number_of_zones - 1] have in it. What range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments added
// sampled value is, we accumulate values in residual capacity. This is what it will look like: | ||
// residual_capacity: 0 10000 15000 | ||
// Now to find zone to route (bucket) we could simply iterate over residual_capacity searching | ||
// where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you fix line wrapping in this comment block. Keep to 100 col flow unless you need a new paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( fix format broke this... fixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg after nit
* Add Dockerfile for the C++ SDK. Signed-off-by: John Plevyak <jplevyak@gmail.com>
* Add Dockerfile for the C++ SDK. Signed-off-by: John Plevyak <jplevyak@gmail.com>
…#114) (envoyproxy#174) Previously, the update callback was called only when the secret was received for the first time or when its value changed. This meant that if the same secret (e.g. trusted CA) was used in multiple resources, then resources using it but configured after the secret was already received, remained unconfigured until the secret's value changed. The missing callback should have resulted in transport factories stuck in the "not ready" state, however, because of an incorrect code, the available secret was processed like inlined validation context, and only rules from the "secret" part of the validation context were applied, leading to a complete bypass of rules from the "default" part. Signed-off-by: Piotr Sikora <piotrsikora@google.com> Co-authored-by: Oliver Liu <yonggangl@google.com> Co-authored-by: Oliver Liu <yonggangl@google.com>
…verprovisioning_factor zh-translation: /intro/arch_overview/upstream/load_balancing/overprovisioning.rst
Documentation around runtime control/stats/general overview are going to be on a separate commit.
I was running simulation tests from load_balancer_simulation_test.cc, results look very good.