-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timing issue with /v3alpha/lease/keepalive that is trivially reproducible with telnet #8237
Comments
Looking into this further, it seems that etcd/grpc-gateway may have a timing/buffering issue with streamed requests. This is very easy to reproduce with telnet. If I paste 10 keepalive messages at once (as chunked, newline-delimited JSON) as a request into the telnet session, I get the expected 10 replies:
Now retry the experiment, but instead of pasting the entire request at once, paste the individual chunks/messages into the telnet session one by one, for example one chunk per second. Now the reply only acknowledges 2 of the messages, even though 10 messages were sent.
Thanks, |
Can repro with this script: #!/bin/bash
set -euo pipefail
function keepalive {
echo -e -n "POST /v3alpha/lease/keepalive HTTP/1.1\x0d\x0a"
echo -e -n "Host: 127.0.0.1\x0d\x0a"
echo -e -n "Transfer-Encoding: chunked\x0d\x0a"
echo -e -n "\x0d\x0a"
msg="{\"ID\":$id}"
len=`printf "%x" $(echo -e -n "$msg" | wc -c)`
for a in `seq 1 10`; do
echo -e -n "$len\x0d\x0a"
echo -e -n "$msg\x0d\x0a"
sleep 0.1s
done
echo -e -n "0\x0d\x0a"
echo -e -n "\x0d\x0a"
}
resp=`curl -s -l http://localhost:2379/v3alpha/lease/grant -XPOST -d'{"TTL": 120}'`
id=`echo $resp | tr ',' '\n' | grep ID | cut -f2 -d':'`
echo "got id $id..."
keepalive $id | nc localhost 2379 The output is:
So a context is getting canceled somewhere in the server and prematurely closing the stream. |
@jamesyonan, @heyitsanthony I would be interested in taking a stab at resolving this unless you were planning on it. |
I think I know the cause. grpc-gateway doesn't support real bi-directonal streams, as mentioned in its README. In the etcd process, you can see logs like below that complains about closed bodies when you run the script provided by @heyitsanthony above.
This error is raised at: https://github.com/coreos/etcd/blob/master/etcdserver/etcdserverpb/gw/rpc.pb.gw.go#L189 The problem is that this code tries to read the request body multiple times, even after the handler has started sending a response. However, in the document of the http package, you can find a statement that tells it's not supported (https://golang.org/pkg/net/http/#ResponseWriter):
So if you The first Flush happens here: Once the first I removed all https://gist.github.com/yudai/fa4d66dd1ae66caf65e8c09eeb2be6b9 |
The error itself is defined here: // ErrBodyReadAfterClose is returned when reading a Request or Response
// Body after the body has been closed. This typically happens when the body is
// read after an HTTP Handler calls WriteHeader or Write on its
// ResponseWriter.
var ErrBodyReadAfterClose = errors.New("http: invalid Read on closed Body") And here is an issue about that: We could use https://github.com/tmc/grpc-websocket-proxy mentioned in grpc-ecosystem/grpc-gateway#170 (comment) |
The websocket proxy looks like an OK fix since it's all opt-in. |
Created a PR for giving it a try. https://github.com/lafikl/telsocket is an easy way to test websocket. #!/bin/bash
set -euo pipefail
function keepalive {
msg="{\"ID\":$1}"
for a in `seq 1 10`; do
echo -e -n "$msg\x0d\x0a"
sleep 0.1s
done
}
resp=`curl -s -l http://localhost:2379/v3alpha/lease/grant -XPOST -d'{"TTL": 120}'`
id=`echo $resp | tr ',' '\n' | grep ID | cut -f2 -d':'`
echo "got id $id..."
keepalive "$id" | telsocket -url ws://127.0.0.1:2379/v3alpha/lease/keepalive It looks working fine. |
I'm still not sure if users want to use websockets indeed, because it requires non-traditional HTTP requests. So I love to hear opinions from @hexfusion as well. |
@yudai thank you for the details here, great stuff! FWIW I would like to try to support this directly via grpc-gateway if possible as well. When you removed the manual Flush calls it fell back to the internal "flush after chuck" handling of "net/http" which as you noted works in this circumstance. But these manual Flush calls are required to allow for example Watch streams to work. So as an opt-in process I was thinking of passing a custom header key to facilitate disabling manual Flush. See details grpc-ecosystem/grpc-gateway#435 Upside would be no code changes to etcd if the idea were accepted. |
@hexfusion Hi, thanks for the comment. |
@yudai yes you are right it could be much more complicated or much more simple. I need to test this more completely but look what happens when you pass
I ran 1000 iterations without error |
@hexfusion Guessing if you change |
@yudai, @hexfusion, thanks for looking into this. @yudai, the websocket streaming support looks promising, I've moved over to using it instead of long-polling standard HTTP requests/replies -- seems like a cleaner approach that offers true bidirectional streaming. Hope to see this merged. |
@yudai yeah this is a good example where the grpc-gateway might not be a viable, complete solution. |
I'll update the PR when I have time, hopefully next week. |
@yudai I ran this 10000 iterations without failure did your testing cause failure? I am going to try a more robust test.
using etcd master |
Just FYI 80k loop test went fine, now going to randomize the sleep for 100k then add this to the documentation with an e2e test. It's not pretty but could be worse, only requiring a single extra header in curl. This is not meant to take away from websocket streaming which is a great idea on it's own. But instead a workaround for grpc-gateway that is already implemented. |
@hexfusion The script above did not fail on my environment as well. |
@yudai -- hope that you can merge the websocket patch. Still working well for us. Thanks, |
@yudai what is the latest progress on this one? hopefully, we could get this issue resolved, and promote the http gateway proxy to beta from alpha. |
Sorry for the delay. I updated the PR and it should be ready to merge (besides tests, as mentioned in the PR). |
fixed by #8257 |
I'm trying to stream lease keepalives via grpc-gateway. When I send a full HTTP header before each request it works fine. However if I try to stream the keepalive messages inside a single HTTP request by sending a newline-delimited JSON message every few seconds inside an HTTP chunk, etcd silently ignores the streamed messages (after the first one in the request) and expires the lease.
Not sure if I'm doing this right, though I did see a line in the docs indicating that grpc-gateway supports newline-delimited JSON streaming. I had no problem streaming replies (for example with /v3alpha/watch) but can't seem to figure out how to stream requests.
Thanks for any guidance.
etcd Version: 3.2.0+git
Thanks,
James
The text was updated successfully, but these errors were encountered: