-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc success/failure stat should count more than 0 as success #1125
Comments
It depends on how you define the "success". However, if you want to count FailedPrecondition as success when calculating SR, you have to do that by yourself. It's not feasible, as today, we only record either success or failure. I'm going to create a PR to add additional stats for each gRPC status code, thus you can use them calculate the SR which meets your business logic. |
@richunger yeah it's a little unclear to me how FailedPrecondition is not an app error here so it does seem to me like the stats are correct for gross success/fail. In any case, as @fengli79 says we will soon have per code stats like we do for HTTP so can do custom stuff as needed. |
Yeah, per-code stats seem like the best path overall. |
@richunger I agree with you that from app perspective, it's an error. From sever perspective, it's not. Let's get the per code stats in place then we can figure out how to change our internal alarms to be better. |
@richunger calling this fixed via #1125. Will get deployed next week. We can discuss alarm/graph changes internally. |
Description: Hardcoding these isn't ideal, but it works for now. Down the road, we should map dynamically at runtime. Risk Level: Low Testing: Local with app filters Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
Description: Hardcoding these isn't ideal, but it works for now. Down the road, we should map dynamically at runtime. Risk Level: Low Testing: Local with app filters Signed-off-by: Mike Schore <mike.schore@gmail.com> Signed-off-by: JP Simard <jp@jpsim.com>
https://github.com/lyft/envoy/blob/3e62653863125c07fc78fcc4bd967a10f026114b/source/common/grpc/common.cc#L34
We use this success stat to calculate our services' success rate. In the REST world, I've generally seen success rate calculated as 2xx+3xx+4xx/total_count. What envoy does now for grpc seems like the equivalent of just having 2xx in the numerator. For example, FailedPrecondition should not count as a failure in calculating SR, right?
The text was updated successfully, but these errors were encountered: