Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow network / local rate limit HTTP filter #5942

Closed
mattklein123 opened this issue Feb 13, 2019 · 10 comments
Closed

Slow network / local rate limit HTTP filter #5942

mattklein123 opened this issue Feb 13, 2019 · 10 comments
Assignees
Labels
enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Milestone

Comments

@mattklein123
Copy link
Member

mattklein123 commented Feb 13, 2019

At Lyft we have a desire to build a slow network simulation system. This would allow debug versions of our applications to request various slow network profiles, without requiring attachment to a special access point, emulator, etc.

Our plan is to do this via a new HTTP filter that has roughly the following capabilities:

  1. Allow latency injection in both the downstream and upstream direction.
  2. Allow throughput limiting in both the downstream and upstream direction.
  3. Allow variable throughput/latency stability via randomness (to simulate spotty networks).

We would like to allow this via fixed configuration / xDS, but also, optionally, via request headers along the lines of:

x-envoy-throttle-response-throughput: int (kilobytes)
x-envoy-throttle-request-throughput: int (kilobytes)
x-envoy-throttle-response-throughput-stability: int (percentage between 0 and 100)
x-envoy-throttle-request-throughput-stability: int (percentage between 0 and 100)
x-envoy-throttle-response-latency: int (milliseconds)
x-envoy-throttle-request-latency: int (milliseconds)
x-envoy-throttle-network-stability: int (percentage between 0 and 100)

Potentially we would also offer pre-canned profiles, configurable via both static config and request headers such as:

x-envoy-throttle-profile: {“no-connection”, “EDGE”, “3G”, “LTE”}

In general, this code will not be difficult to write. There are some additional safety knobs we will need around max concurrent throttled requests, runtime disable, etc.

One thing to consider is whether this should be a new filter, or if we should build this functionality into the existing HTTP fault filter. My inclination is to build a new filter, but there is enough overlap that it might make sense to build this into the existing HTTP fault filter so I'm curious to hear everyone's opinion.

@envoyproxy/maintainers @rshriram @lita @Reflejo @goaway

@mattklein123 mattklein123 added the design proposal Needs design doc/proposal before implementation label Feb 13, 2019
@mattklein123 mattklein123 added this to the 1.10.0 milestone Feb 13, 2019
@mattklein123 mattklein123 self-assigned this Feb 13, 2019
@alyssawilk
Copy link
Contributor

No opinion on which filter this goes in, but just one thought which is this seems like an L7 filter doing L4 work. Are we planning on doing any enforcement if one stream on a connection asks for 3G semantics and another requests LTE, or just assume that since this is mainly for test purposes, it's on the operator to have proper client enforcement and just have the last update take precedence?

@mattklein123
Copy link
Member Author

Are we planning on doing any enforcement if one stream on a connection asks for 3G semantics and another requests LTE, or just assume that since this is mainly for test purposes, it's on the operator to have proper client enforcement and just have the last update take precedence?

My feeling initially is that it's up to the operator. Really the only reason that we propose doing this at the HTTP layer is so that the client can drive the test profile via headers. There isn't any good way to do this at L4 unless we build a proprietary prefix protocol (send some proto before any other data that is then stripped) which I suspect would be substantially more effort for most client implementations.

@rshriram
Copy link
Member

rshriram commented Feb 14, 2019 via email

@mattklein123
Copy link
Member Author

@rshriram we are definitely going to build all this functionality (I agree that in many cases for a message oriented exchange, latency is all that matters, but for a progressive download even over HTTP it might not always be the case). The question is where to build it. I think you are saying you would like this as part of the fault filter? If so that's fine with me.

@ryancox
Copy link
Contributor

ryancox commented Feb 21, 2019

It seems like x-envoy-throttle-latency: int (milliseconds) should be controlled independently for upstream/downstream. This is more consistent with the other related parameters and allows for a little more flexibility.

@mattklein123
Copy link
Member Author

It seems like x-envoy-throttle-latency: int (milliseconds) should be controlled independently for upstream/downstream. This is more consistent with the other related parameters and allows for a little more flexibility.

ACK, I agree we should split this (and all other settings) into independent upstream/downstream options.

@mattklein123
Copy link
Member Author

Note that I think we also want response headers that are set when a throttling profile in place. Thus, if the throttling is user driven the client can adjust UI, etc. accordingly. I will think more about this and update the spec.

mattklein123 added a commit that referenced this issue Mar 5, 2019
1) Add stat to track number of active injected faults
2) Add config/runtime control over how many concurrent
   faults can be injected. This is useful in cases where
   we want to allow 100% fault injection, but want to
   protect against too many concurrent requests using too
   many resources.
3) Add stat for faults that overflowed.
4) Misc code cleanup / modernization.

Part of #5942.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 8, 2019
1) Add stat to track number of active injected faults
2) Add config/runtime control over how many concurrent
   faults can be injected. This is useful in cases where
   we want to allow 100% fault injection, but want to
   protect against too many concurrent requests using too
   many resources.
3) Add stat for faults that overflowed.
4) Misc code cleanup / modernization.

Part of #5942.

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 8, 2019
1) Add stat to track number of active injected faults
2) Add config/runtime control over how many concurrent
   faults can be injected. This is useful in cases where
   we want to allow 100% fault injection, but want to
   protect against too many concurrent requests using too
   many resources.
3) Add stat for faults that overflowed.
4) Misc code cleanup / modernization.

Part of envoyproxy/envoy#5942.

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 191c8b02b4908f212f800ed0185f6ee689ba8126
@mattklein123 mattklein123 added the enhancement Feature requests. Not bugs or questions. label Mar 10, 2019
mattklein123 added a commit that referenced this issue Mar 10, 2019
1) Add partial consumption
2) Fix ceiling math for next wakeup time
3) Minor cleanups

Needed for #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 11, 2019
This PR adds new decode/encodeData() callbacks which allow
allow filters direct control over sending data to subsequent
filters, circumventing any HCM buffering. This is the simplest
and lease invasive change I could come up with to support this
functionality (or others like it).

Fixes #6140
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
htuch pushed a commit that referenced this issue Mar 11, 2019
1) Add partial consumption
2) Fix ceiling math for next wakeup time
3) Minor cleanups

Needed for #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 12, 2019
This PR adds new decode/encodeData() callbacks which allow
allow filters direct control over sending data to subsequent
filters, circumventing any HCM buffering. This is the simplest
and lease invasive change I could come up with to support this
functionality (or others like it).

Fixes #6140
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 12, 2019
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 added a commit that referenced this issue Mar 15, 2019
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 15, 2019
Part of envoyproxy/envoy#5942

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 628d1668d7dc9244e3a8fa3d3fbabca23e92e23d
mattklein123 added a commit that referenced this issue Mar 19, 2019
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
@mattklein123 mattklein123 added no stalebot Disables stalebot from closing an issue and removed design proposal Needs design doc/proposal before implementation labels Mar 20, 2019
@mattklein123 mattklein123 changed the title RFC: Slow network / local rate limit HTTP filter Slow network / local rate limit HTTP filter Mar 20, 2019
@mattklein123 mattklein123 modified the milestones: 1.10.0, 1.11.0 Mar 20, 2019
mattklein123 added a commit that referenced this issue Mar 26, 2019
Part of #5942

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 26, 2019
Part of envoyproxy/envoy#5942

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 805683f835bd63e4b7b9d89059aa0d3783924a93
@yskopets
Copy link
Member

@mattklein123 Hi Matt!

I've noticed that in order to support rate limiting at HTTP level you added new methods to StreamDecoderFilterCallbacks/StreamEncoderFilterCallbacks, namely,

  • StreamDecoderFilterCallbacks::injectDecodedDataToFilterChain(data, end_stream)
  • StreamEncoderFilterCallbacks::injectEncodedDataToFilterChain(data, end_stream).

Do I understand correctly that adding rate limiting at L4 would require similar changes to ReadFilterCallbacks/WriteFilterCallbacks ? Let's say

  • ReadFilterCallbacks::continueReading(data, end_stream)
  • WriteFilterCallbacks::continueWriting(data, end_stream)

Do you have any plans to extend ReadFilterCallbacks/WriteFilterCallbacks this way ?

Is it a good place to contribute ? Or do you see any blockers for that ?

@mattklein123
Copy link
Member Author

@yskopets I think doing an L4 rate limiting filter would be a lot simpler since the filter manager is a lot simpler. I'm pretty sure it could be built with the existing APIs.

@mattklein123
Copy link
Member Author

Going to close this out. We can track enhancements as new feature requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

5 participants