share ewma stats among workers #220

ElvinEfendi · 2019-07-02T15:16:17Z

What this PR does / why we need it:

This is revert of kubernetes#3295. Additionally the PR also adds an e2e test for EWMA and a guard to error when resty lock can not be instantiated.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

wayt · 2019-07-02T17:55:29Z

rootfs/etc/nginx/lua/balancer/ewma.lua

+  -- with an older value in the current worker. Theoretically it's possible that
+  -- we are stuck with the same old EWMA value but for this to happen all the workers
+  -- must be sending request to the same upstream in a way that they always read old value
+  -- and override one another's EWMA value with the old one.


I think this could have some other unexpected side effects, but I can't tell exactly what ATM.
Also not sure how harmful that could be.

I think, if we go with this we need to emit some statsd to have a better view of what is happening.

In parallel we can dig into the solution we were talking about on Slack, asynchronously update EWMA from worker 0 based on information send by workers using shared dict [lr]push.

If you have some very strong opinion that this is safe, please move forward with this in a limited environment.

ElvinEfendi · 2019-07-03T04:57:30Z

rootfs/etc/nginx/lua/balancer/ewma.lua

-  self.ewma_last_touched_at = {}
+
+  ngx.shared.balancer_ewma:flush_all()
+  ngx.shared.balancer_ewma_last_touched_at:flush_all()


In a subsequent PR I'll change this to not flush all data and ensure slow start for new upstreams. I'll also set a specific ngx.var to the value of EWMA corresponding to currently picked upstream/endpoint.

ElvinEfendi · 2019-07-03T13:49:58Z

test/e2e/loadbalance/ewma.go

+		f.UpdateNginxConfigMapData("load-balance", "ewma")
+	})
+
+	It("does not fail requests", func() {


I tried to assert on request distribution but it's too flaky.

ElvinEfendi · 2019-07-03T13:52:39Z

rootfs/etc/nginx/lua/balancer/ewma.lua

+  end
+  local ewma = ngx.shared.balancer_ewma:get(upstream) or 0
+  if lock_err ~= nil then
+    return ewma, lock_err


I'd like to note that lock_err gets ignored. In our DC implementation we also ignore this, but the difference is we emit statsd metric when an error happens.

I assume it's ignored because this happens a lot.

ElvinEfendi force-pushed the ewma-improvements-1 branch 2 times, most recently from d8d875e to 41607c5 Compare July 2, 2019 17:16

wayt reviewed Jul 2, 2019

View reviewed changes

ElvinEfendi force-pushed the ewma-improvements-1 branch 2 times, most recently from cd9dccd to 4981283 Compare July 3, 2019 04:51

share ewma stats among workers

45c6030

ElvinEfendi force-pushed the ewma-improvements-1 branch from 4981283 to c96a721 Compare July 3, 2019 04:52

ElvinEfendi requested review from wayt and csfrancis July 3, 2019 04:53

ElvinEfendi commented Jul 3, 2019

View reviewed changes

e2e test for ewma

ce382f6

ElvinEfendi force-pushed the ewma-improvements-1 branch from c96a721 to ce382f6 Compare July 3, 2019 13:48

ElvinEfendi commented Jul 3, 2019

View reviewed changes

wayt approved these changes Jul 3, 2019

View reviewed changes

ElvinEfendi merged commit 6ba947c into master Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

share ewma stats among workers #220

share ewma stats among workers #220

ElvinEfendi commented Jul 2, 2019 •

edited

Loading

wayt Jul 2, 2019

wayt Jul 2, 2019

ElvinEfendi Jul 3, 2019

ElvinEfendi Jul 3, 2019

ElvinEfendi Jul 3, 2019 •

edited

Loading

share ewma stats among workers #220

share ewma stats among workers #220

Conversation

ElvinEfendi commented Jul 2, 2019 • edited Loading

wayt Jul 2, 2019

Choose a reason for hiding this comment

wayt Jul 2, 2019

Choose a reason for hiding this comment

ElvinEfendi Jul 3, 2019

Choose a reason for hiding this comment

ElvinEfendi Jul 3, 2019

Choose a reason for hiding this comment

ElvinEfendi Jul 3, 2019 • edited Loading

Choose a reason for hiding this comment

ElvinEfendi commented Jul 2, 2019 •

edited

Loading

ElvinEfendi Jul 3, 2019 •

edited

Loading