-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Admission Control Filter #10985
Admission Control Filter #10985
Conversation
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tallen@lyft.com>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Lizan Zhou <lizan@tetrate.io>
This reverts commit 1976891. Signed-off-by: Tony Allen <tony@allen.gg>
Signed-off-by: Tony Allen <tony@allen.gg>
/wait-any |
Windows CI passed. Removing wait. |
@snowp it came up in conversation that you might be interested in taking a look at this. No expectation, just FYI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Flushing out some comments. I think this can be split into multiple PRs. Can you do the controller and unit tests, the evaluators, etc.? Would be good to flush out the TLS/global controller question first though. Thank you, this is awesome!
/wait
// indicate a failed request must be explicitly specified if not relying on the default | ||
// values. | ||
message DefaultEvaluationCriteria { | ||
// If HTTP statuses are unspecified, defaults to 5xx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on valid ranges here and make sure we have code checks for them?
// If HTTP statuses are unspecified, defaults to 5xx. | ||
repeated type.v3.Int32Range http_status = 1; | ||
|
||
// GRPC status codes to consider as request successes. If unspecified, defaults to "OK". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any link that we can have for the code to uint mapping?
// The time window over which the success rate is calculated. The window is rounded to the nearest | ||
// second. Defaults to 120s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify that this is a sliding window?
@@ -317,8 +317,8 @@ void ConnectionImpl::StreamImpl::saveHeader(HeaderString&& name, HeaderString&& | |||
void ConnectionImpl::StreamImpl::submitTrailers(const HeaderMap& trailers) { | |||
std::vector<nghttp2_nv> final_headers; | |||
buildHeaders(final_headers, trailers); | |||
int rc = | |||
nghttp2_submit_trailer(parent_.session_, stream_id_, &final_headers[0], final_headers.size()); | |||
int rc = nghttp2_submit_trailer(parent_.session_, stream_id_, final_headers.data(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge master? Unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was failing in CI because of some scenario where final_headers
had a size of 0, so it was causing undefined behavior. Calling vector::data()
gives us the same information, but circumvents the undefined behavior in the empty vector case.
I have no idea how my change could have caused a regression here since I don't touch really touch anything unrelated to the new filter, so I just assumed it would be fine to do this since it's a better way of getting that pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge master. @antoniovicente already fixed this.
virtual ~AdmissionControlFilterConfig() = default; | ||
|
||
virtual ThreadLocalController& getController() const { | ||
return tls_->getTyped<ThreadLocalControllerImpl>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if/when we discussed this, but what's the tradeoff for a global controller vs. per worker? Would it be better to go with global for better accuracy similar to what we do for adaptive concurrency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adaptive concurrency required global control because of the way it took measurements. We needed high levels of accuracy because of the percentile calculations- we would be in danger of missing tail latencies. We also needed to globally limit the concurrency across all worker threads since it would no longer mean that limiting concurrency (on one thread) resulted in decreased load on the local host. It would almost be the same as using adaptive concurrency on the egress. There's also complications around synchronization of the measurement windows, etc. Global just made more sense.
For this filter, the measurements aren't as finicky and making it thread-local would simply just decrease the number of samples. This is fine, since there's not really a control loop that other threads can interfere with (not sure if I'm using that correctly, but I mean we're not making a change and then measuring its effect). The decision-making here can be decentralized and we still get roughly the same results.
The benefits to thread-local controllers is obviously that we don't need to take any locks on the deque/structures that store the sampling history. The drawbacks are that it's a bit more complicated to implement and we need a copy of the sampling history deque/structures for each thread. I decided that the memory overhead is minimal, so it's worth it to make it thread-local. In cases where memory utilization might raise an eyebrow, we'd be at high risk for lock contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Add some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one potential drawback with TLS is that incoming traffic isn't necessarily evenly distributed between the workers, so it could result in cases where overall we're rejecting some % of traffic, but individual clients that are connected to workers that haven't seen much traffic might be able to pass through without any rejections.
You could imagine 3 clients: client 1 and 2 are connected to worker A and client 3 to worker B. Client 1 starts sending traffic thats failing, triggering throttling. Client 2 and 3 are both sending identical traffic, but only client 2 sees throttling because it happened to be handled by the worker that is also handling client 1.
That said I think TLS is fine assuming we document the behavior.
// To give a new value to the Cleanup() variable, we must cancel it first to avoid triggering | ||
// the function it wraps. | ||
deferred_record_failure_->cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this ever happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been quite a while since I wrote that piece, but I recall that it was happening in the unit tests. It couldn't happen in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's remove.
// Status code must be sent in trailers. | ||
ASSERT(grpc_status.has_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No guarantee the upstream does this. Don't use ASSERT here and handle errors (and test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reason for this was that I read this part in the gRPC over HTTP2 protocol (emphasis mine):
Response-Headers & Trailers-Only are each delivered in a single HTTP2 HEADERS frame block. Most responses are expected to have both headers and trailers but Trailers-Only is permitted for calls that produce an immediate error. Status must be sent in Trailers even if the status code is OK.
I interpreted that to mean that we can always expect a status. This should definitely be handled differently, I'm just unsure of what to do in this case- maybe return a INVALID_ARGUMENT
status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would synthesize a failure value and just increment it (or something like that). It doesn't really matter.
deferred_record_failure_ = | ||
std::make_unique<Cleanup>([this]() { config_->getController().recordFailure(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perf nit: this really shouldn't need a heap allocation. Since I'm pretty sure there must not exist a cleanup already since this is per filter, can you just member allocate a Cleanup inline in the constructor?
const envoy::extensions::filters::http::admission_control::v3alpha::AdmissionControl& config, | ||
const std::string& stats_prefix, Server::Configuration::FactoryContext& context) { | ||
|
||
std::string prefix = stats_prefix + "admission_control."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
response_evaluator = std::make_unique<DefaultResponseEvaluator>(config.default_eval_criteria()); | ||
break; | ||
case AdmissionControlProto::EvaluationCriteriaCase::EVALUATION_CRITERIA_NOT_SET: | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, but this catches a scenario where a new evaluator is implemented and configured, but not handled here. Otherwise, we'd catch it by dereferencing a null pointer somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will only allow the compiler to catch it if you remove default, which is what you want.
Signed-off-by: Tony Allen <tony@allen.gg>
/wait-any |
Signed-off-by: Tony Allen <tony@allen.gg>
/wait-any |
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks pretty good to me. A few comments from looking over the implementation
virtual ~AdmissionControlFilterConfig() = default; | ||
|
||
virtual ThreadLocalController& getController() const { | ||
return tls_->getTyped<ThreadLocalControllerImpl>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one potential drawback with TLS is that incoming traffic isn't necessarily evenly distributed between the workers, so it could result in cases where overall we're rejecting some % of traffic, but individual clients that are connected to workers that haven't seen much traffic might be able to pass through without any rejections.
You could imagine 3 clients: client 1 and 2 are connected to worker A and client 3 to worker B. Client 1 starts sending traffic thats failing, triggering throttling. Client 2 and 3 are both sending identical traffic, but only client 2 sees throttling because it happened to be handled by the worker that is also handling client 1.
That said I think TLS is fine assuming we document the behavior.
// We default to all 5xx codes as request failures. | ||
http_success_fns_.emplace_back([](uint64_t status) { return status < 500; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two lines together got me real confused since the comment talks about what we consider request failures, then the next line defines the success condition instead. Maybe reword the comment?
grpc_success_codes_.emplace_back(status); | ||
} | ||
} else { | ||
grpc_success_codes_ = decltype(grpc_success_codes_)({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would think you can elide the type here? afaik it would resolve to the initializer_list
ctor
grpc_success_codes_.emplace_back(status); | ||
} | ||
} else { | ||
grpc_success_codes_ = decltype(grpc_success_codes_)({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs say you default to Ok, this has much more than that
|
||
private: | ||
std::vector<std::function<bool(uint64_t)>> http_success_fns_; | ||
std::vector<uint64_t> grpc_success_codes_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want this to be uint32_t
, both enumToInt
and the extracted status
is 32 bit
public ThreadLocal::ThreadLocalObject { | ||
public: | ||
ThreadLocalControllerImpl(TimeSource& time_source, std::chrono::seconds sampling_window); | ||
~ThreadLocalControllerImpl() override = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary
|
||
void ThreadLocalControllerImpl::maybeUpdateHistoricalData() { | ||
// Purge stale samples. | ||
while (!historical_data_.empty() && ageOfOldestSample() >= sampling_window_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to same some cycles here i think you can compute the cutoff timestamp and compare each sample with that instead of computing the diff between sampling time and current time for each sample
might not matter, it depends on how many samples are ejected at a time
|
||
// It's possible we purged stale samples from the history and are left with nothing, so it's | ||
// necessary to add an empty entry. We will also need to roll over into a new entry in the | ||
// historical data if we've exceeded the time specified by the granularity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not obvious to me how this granularity comes into play, maybe add some detail?
@mattklein123 requested this PR be split, so I'm closing this one out and submitting another that incorporates the current comments and omits the thread-local controller. |
This is an implementation of success-rate based admission control via an HTTP filter. Based on the success rate, the filter will probabilistically reject outbound requests. Motivation and some background on this filter's calculations can be found in #9658.
The filter tracks request success rate for each worker thread and uses the information to reject requests with some probability dictated by the SR (in some rolling window) before forwarding to the upstream.This removes the need for any locks such as those found in the local ratelimit filter and the adaptive concurrency filter.
To avoid taking any locks, the success is calculated per-thread. We track the total request count and the total number of successes for use in the following rejection probability:
These values are accumulated each second and inserted into a deque that contains the per-second accumulated values for the entire rolling window. This allows us to efficiently phase out stale SR data as it is no longer in the sliding window.
Progress on #9658 and a resurfacing of the work in #10230.