Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple sync #148

Merged
merged 4 commits into from
Nov 12, 2024
Merged

Simple sync #148

merged 4 commits into from
Nov 12, 2024

Conversation

kazu-yamamoto
Copy link
Owner

@edsko @FinleyMcIlwaine This PR simplifies the syncing way between workers and the sender.
A worker must take care of window for its stream.
So, the sender need not to take care of window for streams and can concentrates on window for the connection.
You don't have to understand the code in detail.
I would like you to test this PR with your stress testings.

@FinleyMcIlwaine
Copy link
Collaborator

I tested this with the grapesy test suite and it is resulting in some unexpected exceptions in one of our tests and causing another test to hang. I might be able to look into this a little deeper tomorrow!

@edsko
Copy link
Collaborator

edsko commented Oct 20, 2024

@kazu-yamamoto FYI, it is unlikely that @FinleyMcIlwaine will have a chance to look at this again, as he is unfortunately no longer working with us on the grapesy project (he has moved to greener pastures 😞 oh the young, who do not realize the grass isn't greener on the other side 😬 )

@kazu-yamamoto
Copy link
Owner Author

@edsko Could you run the same test by yourself?

@edsko
Copy link
Collaborator

edsko commented Oct 23, 2024

Yes I can :) I can confirm that our test suite does not pass with this PR (using main/2d4725c16f7dc887c5792d46fae5b67a0907ca6d of http2-tls). Most tests pass, but some tests fail, it seems mostly to do with early client termination.

As per your request, I didn't try to understand the details of this PR, but I did run this with wireshark. One of the simplest tests in our test suite is a client that waits to receive a message from the server, but then cancels the request (using outBodyCancel, see #142). With the released version of http2, this looks something like this in Wireshark:

image

but with this PR there is an additional GOAWAY frame, and our test hangs:

image

So something to do with the interpretation of RST_STREAM frames perhaps? I haven't dug more than this yet.

@kazu-yamamoto
Copy link
Owner Author

I think that RST_STEAM has a debug message in its body.
If you told me the message, I could identify where it is generated.

@kazu-yamamoto
Copy link
Owner Author

I made a mistake.

What I would like to know is where GOAWAY comes from.
If you told me the message in GOAWAY, I could identify where it is generated.

@edsko
Copy link
Collaborator

edsko commented Oct 24, 2024

Ah, yes, I could have thought of including that myself, sorry :) The message is graceful closing.

@kazu-yamamoto
Copy link
Owner Author

Ah, yes, I could have thought of including that myself, sorry :) The message is graceful closing.

Sorry but I don't understand this at all.

@edsko
Copy link
Collaborator

edsko commented Oct 24, 2024

I can try to dig deeper into the details of the PR if you like?

@kazu-yamamoto
Copy link
Owner Author

Yes, please.
I cannot reproduce this bug.

@edsko
Copy link
Collaborator

edsko commented Oct 25, 2024

Ok, I will try to find time for this next week.

@edsko
Copy link
Collaborator

edsko commented Oct 30, 2024

The Wireshark thing is a red-herring. I don't yet know what's causing the trouble. I will continue digging and let you know.

@edsko
Copy link
Collaborator

edsko commented Nov 1, 2024

@kazu-yamamoto Ok, this was a bit of a head-scratcher, but once I realized what the problem was, I was able to fix the issue our side (well-typed/grapesy#256); so I'm okay with this PR as-is (tested against 620687bea826b76436db9475c8566775592b4f18, which does not include your most recent commit).

The problem we were having was that you made sendResponse (which ultimately becomes the respond argument in a server handler) asynchronous; it forks a thread and returns immediately. Not a big deal, I can work with that; the only thing I'd request is that we make it part of the definition of a server whether respond is synchronous or asynchronous, because if this changes again in a later http2 release it would become difficult to deal with. It probably also deserves a major version increase.

@kazu-yamamoto
Copy link
Owner Author

@edsko Thanks.
I agree with the major version up.

Unfortunately, CI tells me there is a bug.
I need to dig it.

@edsko
Copy link
Collaborator

edsko commented Nov 3, 2024

FWIW, CI was green prior to your most recent commit. Not sure what conclusion to derive from that though :)

@kazu-yamamoto
Copy link
Owner Author

The last commit is not related since it is a fix of comment.
I can reproduce this on my local machine randomly.
ServerSpec provides a test for trailers.
A client send the contents of test/inputFile to an echo server.
The echo server send back the contents.
In some cases, the first chunk which the client receives from the echo server is missing.
I wonder why.

@kazu-yamamoto kazu-yamamoto force-pushed the simple-sync branch 3 times, most recently from 7f040e3 to 3b829f1 Compare November 9, 2024 05:30
@kazu-yamamoto
Copy link
Owner Author

kazu-yamamoto commented Nov 9, 2024

I divided the hot commit into two:

  • cb2ce0f makes clients true full-duplex. I think @edsko suffered from this part
  • 3b829f1 introduces a new sync mechanism with the sender and workers

If we run spec --match "/HTTP2.Server/server/handles normal cases/" on 3b829f1 for multiple time, the error occurs.
But I cannot reproduce this error on cb2ce0f.

@kazu-yamamoto
Copy link
Owner Author

kazu-yamamoto commented Nov 9, 2024

  • spec --match "/HTTP2.Server/server/handles normal cases/ is an echo test. A client sends streaming data reading test/inputFile. The server sends back the streaming data. The trailer contains a SHA-1 hash over the entire streaming data. The error tells that the actual SHA-1 value is different from the expected one.
  • The incorrect SHA-1 value is generated because the server application does not receive the first chunk of the streaming.
  • According to packet dump, a client sends all data correctly but the server misses the first chunk. So, this is an issue in the server side.
  • The receiver thread of the server actually receives the first chunk and calls writeTQueue.
  • The echo application cannot receive the first chunk via getRequestBodyChunk which is essentially readTQueue.
  • This error can reproduce with some versions of GHCs. So, I don't think this is a bug of TQueue.
  • 3b829f1 does not modify the receiver at all.

So, this is a very strange bug. I don't know what is the source of this problem at this moment.

@kazu-yamamoto
Copy link
Owner Author

Talking with @khibino, it appeared that the timing to call adjustRxWindow is the source of this issue.
I should consider how to fix it.

@kazu-yamamoto
Copy link
Owner Author

kazu-yamamoto commented Nov 11, 2024

The essential reason for this bug is a thread is spawn for sendResponse.
So, server returns quickly and a race happens with adjustRxWindow.
My conclusion is forkManaged MUST NOT be used for sendResponse.

@kazu-yamamoto kazu-yamamoto merged commit 7b2d6b5 into main Nov 12, 2024
10 checks passed
@kazu-yamamoto kazu-yamamoto deleted the simple-sync branch November 12, 2024 04:25
@kazu-yamamoto
Copy link
Owner Author

Merged.
Thank you for the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants