Slow network / local rate limit HTTP filter #5942

mattklein123 · 2019-02-13T18:36:38Z

At Lyft we have a desire to build a slow network simulation system. This would allow debug versions of our applications to request various slow network profiles, without requiring attachment to a special access point, emulator, etc.

Our plan is to do this via a new HTTP filter that has roughly the following capabilities:

Allow latency injection in both the downstream and upstream direction.
Allow throughput limiting in both the downstream and upstream direction.
Allow variable throughput/latency stability via randomness (to simulate spotty networks).

We would like to allow this via fixed configuration / xDS, but also, optionally, via request headers along the lines of:

x-envoy-throttle-response-throughput: int (kilobytes)
x-envoy-throttle-request-throughput: int (kilobytes)
x-envoy-throttle-response-throughput-stability: int (percentage between 0 and 100)
x-envoy-throttle-request-throughput-stability: int (percentage between 0 and 100)
x-envoy-throttle-response-latency: int (milliseconds)
x-envoy-throttle-request-latency: int (milliseconds)
x-envoy-throttle-network-stability: int (percentage between 0 and 100)

Potentially we would also offer pre-canned profiles, configurable via both static config and request headers such as:

x-envoy-throttle-profile: {“no-connection”, “EDGE”, “3G”, “LTE”}

In general, this code will not be difficult to write. There are some additional safety knobs we will need around max concurrent throttled requests, runtime disable, etc.

One thing to consider is whether this should be a new filter, or if we should build this functionality into the existing HTTP fault filter. My inclination is to build a new filter, but there is enough overlap that it might make sense to build this into the existing HTTP fault filter so I'm curious to hear everyone's opinion.

@envoyproxy/maintainers @rshriram @lita @Reflejo @goaway

The text was updated successfully, but these errors were encountered:

alyssawilk · 2019-02-13T18:45:54Z

No opinion on which filter this goes in, but just one thought which is this seems like an L7 filter doing L4 work. Are we planning on doing any enforcement if one stream on a connection asks for 3G semantics and another requests LTE, or just assume that since this is mainly for test purposes, it's on the operator to have proper client enforcement and just have the last update take precedence?

mattklein123 · 2019-02-13T18:49:43Z

Are we planning on doing any enforcement if one stream on a connection asks for 3G semantics and another requests LTE, or just assume that since this is mainly for test purposes, it's on the operator to have proper client enforcement and just have the last update take precedence?

My feeling initially is that it's up to the operator. Really the only reason that we propose doing this at the HTTP layer is so that the client can drive the test profile via headers. There isn't any good way to do this at L4 unless we build a proprietary prefix protocol (send some proto before any other data that is then stripped) which I suspect would be substantially more effort for most client implementations.

rshriram · 2019-02-14T22:01:15Z

1 is useful, though I believe Roman setup Lyft fault filters in the downstream path using the ingress listener. On 2/3 - One of the reasons I had initially dumbed the http fault filter to discrete delays instead of a throughput constrained model was because of the fact that at application layer (python requests, java lib, etc), these two are indistinguishable. Http is a message oriented protocol. A message that arrives late is same as a message that slowly trickles in, because the app only processes things at message boundaries. It would make sense to build a controller in the fault filter that dynamically adapted the delays injected per outbound request, to simulate things like long fat pipes, etc. (assuming app is sending multiple requests per upstream, instead of a ping pong model). Essentially a stateful fault injection filter that kept track of a pseudo session and injected variable (or distribution driven) delay in requests in that session. If you are talking about tcp proxy, then 2/3 would be great.

…

On Wed, Feb 13, 2019 at 1:36 PM Matt Klein ***@***.***> wrote: At Lyft we have a desire to build a slow network simulation system. This would allow debug versions of our applications to request various slow network profiles, without requiring attachment to a special access point, emulator, etc. Our plan is to do this via a new HTTP filter that has roughly the following capabilities: 1. Allow latency injection in both the downstream and upstream direction. 2. Allow throughput limiting in both the downstream and upstream direction. 3. Allow variable throughput/latency stability via randomness (to simulate spotty networks). We would like to allow this via fixed configuration / xDS, but also, optionally, via request headers along the lines of: x-envoy-throttle-upstream-throughput: int (kilobytes) x-envoy-throttle-downstream-throughput: int (kilobytes) x-envoy-throttle-network-stability: int (percentage between 0 and 100) x-envoy-throttle-latency: int (milliseconds) x-envoy-throttle-throughput-stability: int (percentage between 0 and 100) Potentially we would also offer pre-canned profiles, configurable via both static config and request headers such as: x-envoy-throttle-profile: {“no-connection”, “EDGE”, “3G”, “LTE”} In general, this code will not be difficult to write. There are some additional safety knobs we will need around max concurrent throttled requests, runtime disable, etc. One thing to consider is whether this should be a new filter, or if we should build this functionality into the existing HTTP fault filter. My inclination is to build a new filter, but there is enough overlap that it might make sense to build this into the existing HTTP fault filter so I'm curious to hear everyone's opinion. @envoyproxy/maintainers <https://github.com/orgs/envoyproxy/teams/maintainers> @rshriram <https://github.com/rshriram> @lita <https://github.com/lita> @Reflejo <https://github.com/Reflejo> @goaway <https://github.com/goaway> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5942>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH0qd9wTUtC7xaN4JRcytAmFYUs752WEks5vNFtIgaJpZM4a6BNg> .

mattklein123 · 2019-02-14T22:05:33Z

@rshriram we are definitely going to build all this functionality (I agree that in many cases for a message oriented exchange, latency is all that matters, but for a progressive download even over HTTP it might not always be the case). The question is where to build it. I think you are saying you would like this as part of the fault filter? If so that's fine with me.

ryancox · 2019-02-21T23:50:34Z

It seems like x-envoy-throttle-latency: int (milliseconds) should be controlled independently for upstream/downstream. This is more consistent with the other related parameters and allows for a little more flexibility.

mattklein123 · 2019-02-22T00:35:53Z

It seems like x-envoy-throttle-latency: int (milliseconds) should be controlled independently for upstream/downstream. This is more consistent with the other related parameters and allows for a little more flexibility.

ACK, I agree we should split this (and all other settings) into independent upstream/downstream options.

mattklein123 · 2019-02-27T23:51:36Z

Note that I think we also want response headers that are set when a throttling profile in place. Thus, if the throttling is user driven the client can adjust UI, etc. accordingly. I will think more about this and update the spec.

1) Add stat to track number of active injected faults 2) Add config/runtime control over how many concurrent faults can be injected. This is useful in cases where we want to allow 100% fault injection, but want to protect against too many concurrent requests using too many resources. 3) Add stat for faults that overflowed. 4) Misc code cleanup / modernization. Part of #5942. Signed-off-by: Matt Klein <mklein@lyft.com>

1) Add stat to track number of active injected faults 2) Add config/runtime control over how many concurrent faults can be injected. This is useful in cases where we want to allow 100% fault injection, but want to protect against too many concurrent requests using too many resources. 3) Add stat for faults that overflowed. 4) Misc code cleanup / modernization. Part of envoyproxy/envoy#5942. Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 191c8b02b4908f212f800ed0185f6ee689ba8126

1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

This PR adds new decode/encodeData() callbacks which allow allow filters direct control over sending data to subsequent filters, circumventing any HCM buffering. This is the simplest and lease invasive change I could come up with to support this functionality (or others like it). Fixes #6140 Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

This PR adds new decode/encodeData() callbacks which allow allow filters direct control over sending data to subsequent filters, circumventing any HCM buffering. This is the simplest and lease invasive change I could come up with to support this functionality (or others like it). Fixes #6140 Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

Part of envoyproxy/envoy#5942 Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 628d1668d7dc9244e3a8fa3d3fbabca23e92e23d

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

Part of envoyproxy/envoy#5942 Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 805683f835bd63e4b7b9d89059aa0d3783924a93

yskopets · 2019-04-17T23:05:38Z

@mattklein123 Hi Matt!

I've noticed that in order to support rate limiting at HTTP level you added new methods to StreamDecoderFilterCallbacks/StreamEncoderFilterCallbacks, namely,

StreamDecoderFilterCallbacks::injectDecodedDataToFilterChain(data, end_stream)
StreamEncoderFilterCallbacks::injectEncodedDataToFilterChain(data, end_stream).

Do I understand correctly that adding rate limiting at L4 would require similar changes to ReadFilterCallbacks/WriteFilterCallbacks ? Let's say

ReadFilterCallbacks::continueReading(data, end_stream)
WriteFilterCallbacks::continueWriting(data, end_stream)

Do you have any plans to extend ReadFilterCallbacks/WriteFilterCallbacks this way ?

Is it a good place to contribute ? Or do you see any blockers for that ?

mattklein123 · 2019-04-18T16:25:23Z

@yskopets I think doing an L4 rate limiting filter would be a lot simpler since the filter manager is a lot simpler. I'm pretty sure it could be built with the existing APIs.

mattklein123 · 2019-04-18T16:35:16Z

Going to close this out. We can track enhancements as new feature requests.

mattklein123 added the design proposal Needs design doc/proposal before implementation label Feb 13, 2019

mattklein123 added this to the 1.10.0 milestone Feb 13, 2019

mattklein123 self-assigned this Feb 13, 2019

mattklein123 mentioned this issue Mar 1, 2019

http: HCM changes to support rate limited / metered data #6140

Closed

mattklein123 mentioned this issue Mar 5, 2019

http fault: add active fault stat and overflow setting #6167

Merged

mattklein123 added the enhancement Feature requests. Not bugs or questions. label Mar 10, 2019

mattklein123 added a commit that referenced this issue Mar 10, 2019

token bucket: several fixes

40e5318

1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 mentioned this issue Mar 10, 2019

token bucket: several fixes #6235

Merged

mattklein123 mentioned this issue Mar 11, 2019

http: add HCM functionality required for rate limiting #6242

Merged

htuch pushed a commit that referenced this issue Mar 11, 2019

token bucket: several fixes (#6235)

5dc60b9

1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 added a commit that referenced this issue Mar 12, 2019

http fault: add response rate limit injection

50062ed

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 mentioned this issue Mar 12, 2019

http fault: add response rate limit injection #6267

Merged

mattklein123 added a commit that referenced this issue Mar 15, 2019

http fault: add response rate limit injection (#6267)

628d166

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 added a commit that referenced this issue Mar 19, 2019

http fault: implement header controlled faults

b34e7ff

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 mentioned this issue Mar 19, 2019

http fault: implement header controlled faults #6318

Merged

mattklein123 added no stalebot Disables stalebot from closing an issue and removed design proposal Needs design doc/proposal before implementation labels Mar 20, 2019

mattklein123 changed the title ~~RFC: Slow network / local rate limit HTTP filter~~ Slow network / local rate limit HTTP filter Mar 20, 2019

mattklein123 modified the milestones: 1.10.0, 1.11.0 Mar 20, 2019

mattklein123 added a commit that referenced this issue Mar 26, 2019

http fault: implement header controlled faults (#6318)

805683f

Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>

mattklein123 closed this as completed Apr 18, 2019

yskopets mentioned this issue Apr 18, 2019

Add injectDataToFilterChain(data, end_stream) methods to NetworkFilter callbacks #6640

Closed

yskopets mentioned this issue Apr 29, 2019

network filters: add injectDataToFilterChain(data, end_stream) method to network filter callbacks #6750

Merged

donyu mentioned this issue Mar 4, 2020

Adding abort-percentage and abort-http-status HTTP filters #10254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow network / local rate limit HTTP filter #5942

Slow network / local rate limit HTTP filter #5942

mattklein123 commented Feb 13, 2019 •

edited

Loading

alyssawilk commented Feb 13, 2019

mattklein123 commented Feb 13, 2019

rshriram commented Feb 14, 2019 via email

mattklein123 commented Feb 14, 2019

ryancox commented Feb 21, 2019

mattklein123 commented Feb 22, 2019

mattklein123 commented Feb 27, 2019

yskopets commented Apr 17, 2019

mattklein123 commented Apr 18, 2019

mattklein123 commented Apr 18, 2019

Slow network / local rate limit HTTP filter #5942

Slow network / local rate limit HTTP filter #5942

Comments

mattklein123 commented Feb 13, 2019 • edited Loading

alyssawilk commented Feb 13, 2019

mattklein123 commented Feb 13, 2019

rshriram commented Feb 14, 2019 via email

mattklein123 commented Feb 14, 2019

ryancox commented Feb 21, 2019

mattklein123 commented Feb 22, 2019

mattklein123 commented Feb 27, 2019

yskopets commented Apr 17, 2019

mattklein123 commented Apr 18, 2019

mattklein123 commented Apr 18, 2019

mattklein123 commented Feb 13, 2019 •

edited

Loading