-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow network / local rate limit HTTP filter #5942
Comments
No opinion on which filter this goes in, but just one thought which is this seems like an L7 filter doing L4 work. Are we planning on doing any enforcement if one stream on a connection asks for 3G semantics and another requests LTE, or just assume that since this is mainly for test purposes, it's on the operator to have proper client enforcement and just have the last update take precedence? |
My feeling initially is that it's up to the operator. Really the only reason that we propose doing this at the HTTP layer is so that the client can drive the test profile via headers. There isn't any good way to do this at L4 unless we build a proprietary prefix protocol (send some proto before any other data that is then stripped) which I suspect would be substantially more effort for most client implementations. |
1 is useful, though I believe Roman setup Lyft fault filters in the
downstream path using the ingress listener.
On 2/3 - One of the reasons I had initially dumbed the http fault filter to
discrete delays instead of a throughput constrained model was because of
the fact that at application layer (python requests, java lib, etc), these
two are indistinguishable. Http is a message oriented protocol. A message
that arrives late is same as a message that slowly trickles in, because the
app only processes things at message boundaries.
It would make sense to build a controller in the fault filter that
dynamically adapted the delays injected per outbound request, to simulate
things like long fat pipes, etc. (assuming app is sending multiple requests
per upstream, instead of a ping pong model). Essentially a stateful fault
injection filter that kept track of a pseudo session and injected variable
(or distribution driven) delay in requests in that session.
If you are talking about tcp proxy, then 2/3 would be great.
…On Wed, Feb 13, 2019 at 1:36 PM Matt Klein ***@***.***> wrote:
At Lyft we have a desire to build a slow network simulation system. This
would allow debug versions of our applications to request various slow
network profiles, without requiring attachment to a special access point,
emulator, etc.
Our plan is to do this via a new HTTP filter that has roughly the
following capabilities:
1. Allow latency injection in both the downstream and upstream
direction.
2. Allow throughput limiting in both the downstream and upstream
direction.
3. Allow variable throughput/latency stability via randomness (to
simulate spotty networks).
We would like to allow this via fixed configuration / xDS, but also,
optionally, via request headers along the lines of:
x-envoy-throttle-upstream-throughput: int (kilobytes)
x-envoy-throttle-downstream-throughput: int (kilobytes)
x-envoy-throttle-network-stability: int (percentage between 0 and 100)
x-envoy-throttle-latency: int (milliseconds)
x-envoy-throttle-throughput-stability: int (percentage between 0 and 100)
Potentially we would also offer pre-canned profiles, configurable via both
static config and request headers such as:
x-envoy-throttle-profile: {“no-connection”, “EDGE”, “3G”, “LTE”}
In general, this code will not be difficult to write. There are some
additional safety knobs we will need around max concurrent throttled
requests, runtime disable, etc.
One thing to consider is whether this should be a new filter, or if we
should build this functionality into the existing HTTP fault filter. My
inclination is to build a new filter, but there is enough overlap that it
might make sense to build this into the existing HTTP fault filter so I'm
curious to hear everyone's opinion.
@envoyproxy/maintainers
<https://github.com/orgs/envoyproxy/teams/maintainers> @rshriram
<https://github.com/rshriram> @lita <https://github.com/lita> @Reflejo
<https://github.com/Reflejo> @goaway <https://github.com/goaway>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5942>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH0qd9wTUtC7xaN4JRcytAmFYUs752WEks5vNFtIgaJpZM4a6BNg>
.
|
@rshriram we are definitely going to build all this functionality (I agree that in many cases for a message oriented exchange, latency is all that matters, but for a progressive download even over HTTP it might not always be the case). The question is where to build it. I think you are saying you would like this as part of the fault filter? If so that's fine with me. |
It seems like |
ACK, I agree we should split this (and all other settings) into independent upstream/downstream options. |
Note that I think we also want response headers that are set when a throttling profile in place. Thus, if the throttling is user driven the client can adjust UI, etc. accordingly. I will think more about this and update the spec. |
1) Add stat to track number of active injected faults 2) Add config/runtime control over how many concurrent faults can be injected. This is useful in cases where we want to allow 100% fault injection, but want to protect against too many concurrent requests using too many resources. 3) Add stat for faults that overflowed. 4) Misc code cleanup / modernization. Part of #5942. Signed-off-by: Matt Klein <mklein@lyft.com>
1) Add stat to track number of active injected faults 2) Add config/runtime control over how many concurrent faults can be injected. This is useful in cases where we want to allow 100% fault injection, but want to protect against too many concurrent requests using too many resources. 3) Add stat for faults that overflowed. 4) Misc code cleanup / modernization. Part of #5942. Signed-off-by: Matt Klein <mklein@lyft.com>
1) Add stat to track number of active injected faults 2) Add config/runtime control over how many concurrent faults can be injected. This is useful in cases where we want to allow 100% fault injection, but want to protect against too many concurrent requests using too many resources. 3) Add stat for faults that overflowed. 4) Misc code cleanup / modernization. Part of envoyproxy/envoy#5942. Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 191c8b02b4908f212f800ed0185f6ee689ba8126
1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
This PR adds new decode/encodeData() callbacks which allow allow filters direct control over sending data to subsequent filters, circumventing any HCM buffering. This is the simplest and lease invasive change I could come up with to support this functionality (or others like it). Fixes #6140 Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
1) Add partial consumption 2) Fix ceiling math for next wakeup time 3) Minor cleanups Needed for #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
This PR adds new decode/encodeData() callbacks which allow allow filters direct control over sending data to subsequent filters, circumventing any HCM buffering. This is the simplest and lease invasive change I could come up with to support this functionality (or others like it). Fixes #6140 Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
Part of envoyproxy/envoy#5942 Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 628d1668d7dc9244e3a8fa3d3fbabca23e92e23d
Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
Part of #5942 Signed-off-by: Matt Klein <mklein@lyft.com>
Part of envoyproxy/envoy#5942 Signed-off-by: Matt Klein <mklein@lyft.com> Mirrored from https://github.com/envoyproxy/envoy @ 805683f835bd63e4b7b9d89059aa0d3783924a93
@mattklein123 Hi Matt! I've noticed that in order to support rate limiting at HTTP level you added new methods to
Do I understand correctly that adding rate limiting at L4 would require similar changes to
Do you have any plans to extend Is it a good place to contribute ? Or do you see any blockers for that ? |
@yskopets I think doing an L4 rate limiting filter would be a lot simpler since the filter manager is a lot simpler. I'm pretty sure it could be built with the existing APIs. |
Going to close this out. We can track enhancements as new feature requests. |
At Lyft we have a desire to build a slow network simulation system. This would allow debug versions of our applications to request various slow network profiles, without requiring attachment to a special access point, emulator, etc.
Our plan is to do this via a new HTTP filter that has roughly the following capabilities:
We would like to allow this via fixed configuration / xDS, but also, optionally, via request headers along the lines of:
Potentially we would also offer pre-canned profiles, configurable via both static config and request headers such as:
In general, this code will not be difficult to write. There are some additional safety knobs we will need around max concurrent throttled requests, runtime disable, etc.
One thing to consider is whether this should be a new filter, or if we should build this functionality into the existing HTTP fault filter. My inclination is to build a new filter, but there is enough overlap that it might make sense to build this into the existing HTTP fault filter so I'm curious to hear everyone's opinion.
@envoyproxy/maintainers @rshriram @lita @Reflejo @goaway
The text was updated successfully, but these errors were encountered: