transport: Pass Header metadata to tap handle. #6652

pstibrany · 2023-09-21T12:47:00Z

This PR extends tap.Info struct with metadata, and modifies http2_server.go to pass available metadata to ServerInHandle function.

This gives tap handle more details that it can use to reject request early.

RELEASE NOTES:

tap (experimental): Add Header metadata to tap handler

linux-foundation-easycla · 2023-09-21T12:47:04Z

The committers listed above are authorized under a signed CLA.

✅ login: pstibrany / name: Peter Štibraný (9f727c4, 157896d, cc6c85a)

dfawley · 2023-09-21T18:07:12Z

May I ask your use case for the tap package? Have you compared this to using interceptors for the same purpose, by any chance?

The other grpc languages don't have this construct (and seem to be doing just fine without it), and we recently noticed that it's excluded from metrics & tracing, which is not very desirable from an observability perspective, and we may not want to introduce metrics callouts specifically for it, either, given its sensitive location. So I was actually considering just calling it deprecated and not recommending anyone to use it.

pstibrany · 2023-09-22T08:15:04Z

May I ask your use case for the tap package? Have you compared this to using interceptors for the same purpose, by any chance?

Hi. Our use case is to drop requests early, before spending too much resources on it. We use this when server is already unable to accept more messages (eg. too many inflight requests). We have tried to use interceptors before, but that did not provide adequate protection.

From what I understand, gRPC-go is processing the request in following steps (discussing unary case):

Read request headers.
Call tap handle, if configured. Tap handle can reject the request at this point.
Request is then handed to one of worker goroutines (if configured, and any is free), or new goroutine is spawned.
Generated method handler is called. This method handler receives function to read the request into memory. After reading request, generated method handler then then unmarshals the message.
Unmarshaled message is passed to interceptors and eventually registered handler.

As can be seen in this description, when interceptors run, quite a lot of resources have already been used: possibly new goroutine spawned, request is in memory twice (once as bytes buffer, once as unmarshaled message). We want to avoid all that when we know that server can't possibly handle any more requests at the moment.

When we tried to use interceptors, we couldn't implement the protection effectively, and we could still crash the server under the load, for example by sending enough big messages -- then the server will just OOM.

The other grpc languages don't have this construct (and seem to be doing just fine without it), and we recently noticed that it's excluded from metrics & tracing, which is not very desirable from an observability perspective, and we may not want to introduce metrics callouts specifically for it, either, given its sensitive location. So I was actually considering just calling it deprecated and not recommending anyone to use it.

We've also noticed exclusion from metrics, because our metrics get updated in stats handler and interceptors. However that's easy to fix: we can increment relevant metric when rejecting request in tap handle. As for tracing, in our setup client would log returned grpc error to its trace span. (But we use grpc only in internal communication)

I think that even with these limitations, tap handle is useful feature and gives us much-needed protection against overloading server with too many messages.

dfawley · 2023-09-22T15:48:12Z

Thanks for all the detailed info - it was really helpful!

Do you happen to work with gRPC in other languages besides Go (on the server side)? I'm not sure, but I believe the recommended way of dealing with these kinds of issues would be to use a LimitingListener and MaxConcurrentStreams instead. Have you tried these, by any chance?

It's also possible the other languages tend to provide interceptors at a lower level. Especially if you're looking at the message payload size in unary RPCs, I believe the other languages provide interceptors that are more like our streaming interceptors, but for all RPCs, that do not require the payload to be received and decoded before calling them, as our unary interceptors do. Unary interceptors can be nice for usability, but otherwise have some undesirable properties (e.g. they require implementing things twice if you have both unary and streaming RPCs that want the same interceptor behavior). To that end, we do have #1805 open to provide a streaming-style interceptor even for unary RPCs. It hasn't been prioritized, but might allow for sufficiently early termination of RPCs - it would cut out steps 4 and 5, saving 2 copies of the message payload. Could something like that potentially work for you in place of the tap handle feature?

pstibrany · 2023-09-25T12:50:24Z

Thank you for reply!

Do you happen to work with gRPC in other languages besides Go (on the server side)?

Not at the moment.

I'm not sure, but I believe the recommended way of dealing with these kinds of issues would be to use a LimitingListener and MaxConcurrentStreams instead. Have you tried these, by any chance?

We have experimented with MaxConcurrentStreams (which is per-connection), but not with LimitListener.

One problem with this approach is how to find the correct values. For example let's say we know that we can handle 1000 concurrent requests before OOMing. Our number of clients sending requests is dynamic, and we autoscale them with the load. We would need to allow LimitListener to allow "max autoscale" number of clients (say 300). But to handle max 1000 concurrent requests, we would then need to set MaxConcurrentStreams to 1000/300 = 3. That's clearly too low, and aims for worst-case scenario where we run at max number of clients.

Furthermore changing LimitListener's limit or MaxConcurrentStreams would require restart of the service. (At least with current implementations)

Another problem is that this approach doesn't allows us to distinguish between various methods. Eg. we never want to limit health checks. This is also the reason for this PR: we want more details to distinguish requests 😄

To that end, we do have #1805 open to provide a streaming-style interceptor even for unary RPCs. It hasn't been prioritized, but might allow for sufficiently early termination of RPCs - it would cut out steps 4 and 5, saving 2 copies of the message payload. Could something like that potentially work for you in place of the tap handle feature?

I think it would be a step in right direction, although I'm worried that step 3 -- creation of Goroutine -- may still be a problem. (If many clients try to send request all at once, and they create so many goroutines that server crashes with OOM) To give you definitive "Yes", we would need to load-test it. Goroutines problem could perhaps be solved by implementing hard-limit for number of workers, although once again it would probably be global to gRPC server, and not per-method.

dfawley

I think we can take this PR, but be aware that use of the tap handler feature is not recommended.

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

grpc-go/test/end2end_test.go

Line 2111 in fd9ef72

func testTap(t *testing.T, e env) {

pstibrany · 2023-10-02T08:06:51Z

I think we can take this PR, but be aware that use of the tap handler feature is not recommended.

👍 Understood. The best I can hope for is that it won't be removed anytime soon (before usable alternative). Thank you!

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

grpc-go/test/end2end_test.go

Line 2111 in fd9ef72

func testTap(t *testing.T, e env) {

Thank you for the pointer. I was wondering how to add test for this change and missed the end2end test!

pstibrany · 2023-10-03T07:56:24Z

Also, can you please add a test for this. There is an existing TestTap in test/end2end_test.go that you can probably pretty easily extend:

I've extended existing test to cover this change. Please take a look again.

Remove unwanted changes.

dfawley · 2023-10-05T16:41:04Z

+zasweq for a second look

zasweq

LGTM.

pstibrany · 2023-10-06T06:38:29Z

Thank you!

bboreham · 2023-10-18T15:04:19Z

We have experimented with MaxConcurrentStreams (which is per-connection)

~~Not any more: as of #6703 it is also implemented to limit the whole server.~~

dfawley · 2023-10-18T15:42:03Z

Not any more: as of #6703 it is also implemented to limit the whole server.

That's not right. #6703 is still a per-connection limit. The difference is it limits the number of running method handlers, not just the number of streams from an HTTP/2 accounting point of view.

bboreham · 2023-10-19T08:05:23Z

Apologies, this was a fantasy that entered my mind during intensive debugging.

dfawley added the Status: Requires Reporter Clarification label Sep 21, 2023

dfawley assigned pstibrany Sep 21, 2023

dfawley self-requested a review September 21, 2023 18:07

dfawley assigned dfawley and unassigned pstibrany Sep 26, 2023

dfawley removed the Status: Requires Reporter Clarification label Sep 26, 2023

dfawley reviewed Sep 29, 2023

View reviewed changes

dfawley assigned pstibrany and unassigned dfawley Sep 29, 2023

dfawley added the Status: Requires Reporter Clarification label Sep 29, 2023

ginayeh added Type: Bug Type: Feature New features or improvements in behavior and removed Type: Bug labels Oct 2, 2023

transport: Pass Header metadata to tap handle.

9f727c4

pstibrany force-pushed the pass-header-to-tap-handler branch from 35fa329 to 9f727c4 Compare October 3, 2023 07:53

pstibrany added 2 commits October 3, 2023 09:57

Update end2end_test.go

157896d

Remove unwanted changes.

Update http2_server.go

cc6c85a

Remove unwanted changes.

ginayeh assigned dfawley Oct 3, 2023

ginayeh removed the Status: Requires Reporter Clarification label Oct 3, 2023

ginayeh unassigned pstibrany Oct 3, 2023

pstibrany mentioned this pull request Oct 5, 2023

Inflight request size limit grafana/mimir#6212

Open

dfawley approved these changes Oct 5, 2023

View reviewed changes

dfawley requested a review from zasweq October 5, 2023 16:40

dfawley assigned zasweq and unassigned dfawley Oct 5, 2023

dfawley added this to the 1.59 Release milestone Oct 5, 2023

zasweq approved these changes Oct 5, 2023

View reviewed changes

zasweq merged commit be7919c into grpc:master Oct 5, 2023
10 checks passed

pstibrany deleted the pass-header-to-tap-handler branch October 6, 2023 08:15

pstibrany mentioned this pull request Oct 6, 2023

Pass grpc request headers to GrpcInflightMethodLimiter grafana/dskit#392

Merged

2 tasks

github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transport: Pass Header metadata to tap handle. #6652

transport: Pass Header metadata to tap handle. #6652

pstibrany commented Sep 21, 2023 •

edited by dfawley

Loading

linux-foundation-easycla bot commented Sep 21, 2023 •

edited

Loading

dfawley commented Sep 21, 2023

pstibrany commented Sep 22, 2023 •

edited

Loading

dfawley commented Sep 22, 2023

pstibrany commented Sep 25, 2023 •

edited

Loading

dfawley left a comment

pstibrany commented Oct 2, 2023

pstibrany commented Oct 3, 2023 •

edited

Loading

dfawley commented Oct 5, 2023

zasweq left a comment

pstibrany commented Oct 6, 2023

bboreham commented Oct 18, 2023 •

edited

Loading

dfawley commented Oct 18, 2023

bboreham commented Oct 19, 2023

transport: Pass Header metadata to tap handle. #6652

transport: Pass Header metadata to tap handle. #6652

Conversation

pstibrany commented Sep 21, 2023 • edited by dfawley Loading

linux-foundation-easycla bot commented Sep 21, 2023 • edited Loading

dfawley commented Sep 21, 2023

pstibrany commented Sep 22, 2023 • edited Loading

dfawley commented Sep 22, 2023

pstibrany commented Sep 25, 2023 • edited Loading

dfawley left a comment

Choose a reason for hiding this comment

pstibrany commented Oct 2, 2023

pstibrany commented Oct 3, 2023 • edited Loading

dfawley commented Oct 5, 2023

zasweq left a comment

Choose a reason for hiding this comment

pstibrany commented Oct 6, 2023

bboreham commented Oct 18, 2023 • edited Loading

dfawley commented Oct 18, 2023

bboreham commented Oct 19, 2023

pstibrany commented Sep 21, 2023 •

edited by dfawley

Loading

linux-foundation-easycla bot commented Sep 21, 2023 •

edited

Loading

pstibrany commented Sep 22, 2023 •

edited

Loading

pstibrany commented Sep 25, 2023 •

edited

Loading

pstibrany commented Oct 3, 2023 •

edited

Loading

bboreham commented Oct 18, 2023 •

edited

Loading