-
Notifications
You must be signed in to change notification settings - Fork 25
Conversation
d8d875e
to
41607c5
Compare
-- with an older value in the current worker. Theoretically it's possible that | ||
-- we are stuck with the same old EWMA value but for this to happen all the workers | ||
-- must be sending request to the same upstream in a way that they always read old value | ||
-- and override one another's EWMA value with the old one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could have some other unexpected side effects, but I can't tell exactly what ATM.
Also not sure how harmful that could be.
I think, if we go with this we need to emit some statsd to have a better view of what is happening.
In parallel we can dig into the solution we were talking about on Slack, asynchronously update EWMA from worker 0 based on information send by workers using shared dict [lr]push
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have some very strong opinion that this is safe, please move forward with this in a limited environment.
cd9dccd
to
4981283
Compare
4981283
to
c96a721
Compare
self.ewma_last_touched_at = {} | ||
|
||
ngx.shared.balancer_ewma:flush_all() | ||
ngx.shared.balancer_ewma_last_touched_at:flush_all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a subsequent PR I'll change this to not flush all data and ensure slow start for new upstreams. I'll also set a specific ngx.var
to the value of EWMA corresponding to currently picked upstream/endpoint.
c96a721
to
ce382f6
Compare
f.UpdateNginxConfigMapData("load-balance", "ewma") | ||
}) | ||
|
||
It("does not fail requests", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to assert on request distribution but it's too flaky.
end | ||
local ewma = ngx.shared.balancer_ewma:get(upstream) or 0 | ||
if lock_err ~= nil then | ||
return ewma, lock_err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to note that lock_err
gets ignored. In our DC implementation we also ignore this, but the difference is we emit statsd metric when an error happens.
I assume it's ignored because this happens a lot.
What this PR does / why we need it:
This is revert of kubernetes#3295. Additionally the PR also adds an e2e test for EWMA and a guard to error when resty lock can not be instantiated.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Special notes for your reviewer: