You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rq_success inc()ed upon seeing "non-5xx" response headers.
rq_active_ dec() happens on the stream deferred deletion time.
if [load reporter latch] happen between 1. and 2., double counting on active requests would happen. This is more visible when stream destruction involving some time consuming operations, e.g. logging to external services.
It's also noteworthy that the if statement is checking on rq_success+rq_error+rq_active, in most happy cases, this number should match rq_total.
There is an error case that leads to load reporting failure tho: with grpc, grpc server fails to send trailers, the rq_success_.inc() or rq_error_.inc() in Router::Filter::onUpstreamTrailers() will never be called:
rq_active_ will dec() as the stream destructs, but rq_[success/error]_ will not inc().
We should probably fix the rq_[success/error]_ inc issue in another PR, but in this issue I hope we can solve the issue on the load_stats_reporter side by changing the if statement to use rq_issued, specifically:
changing if statement to check on rq_issued.
This way it gives the LRS server a chance to infer the "load " totally based on rq_total.
The text was updated successfully, but these errors were encountered:
Title: use rq_total in load_stats_reporter, current math might be double counting,
Description:
In current load_stats_reporter impl, there is a chance that we are double counting requests:
https://github.com/envoyproxy/envoy/blob/main/source/common/upstream/load_stats_reporter.cc#L79-L111
It's also noteworthy that the if statement is checking on
rq_success+rq_error+rq_active
, in most happy cases, this number should match rq_total.There is an error case that leads to load reporting failure tho: with grpc, grpc server fails to send trailers, the rq_success_.inc() or rq_error_.inc() in Router::Filter::onUpstreamTrailers() will never be called:
rq_active_ will dec() as the stream destructs, but rq_[success/error]_ will not inc().
We should probably fix the rq_[success/error]_ inc issue in another PR, but in this issue I hope we can solve the issue on the load_stats_reporter side by changing the if statement to use rq_issued, specifically:
changing if statement to check on
rq_issued
.This way it gives the LRS server a chance to infer the "load " totally based on rq_total.
The text was updated successfully, but these errors were encountered: