Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mid-stream error semantics? #895

Closed
tigt opened this issue Jul 7, 2021 · 3 comments
Closed

Mid-stream error semantics? #895

tigt opened this issue Jul 7, 2021 · 3 comments

Comments

@tigt
Copy link

tigt commented Jul 7, 2021

The problem

I have a website that uses HTTP/1.1 chunked transfer-encoding to incrementally show results from asynchronous backend API calls in a streamed HTML response. This response routes through an NGiNX reverse-proxy, a CDN, and then an unknown number of gateways, middleboxes, and inspectors (like antivirus programs) before reaching the requesting user-agent.

Sometimes, a backend API call fails: the server’s connection to it closes unexpectedly, the backend emits an error of its own, or any other of the myriad ways computers and networks attack. By that time, I’ve already sent an HTTP status code and headers, but I really want the ability to tell any consuming clients that the stream encountered an error and the response should now be considered invalid, otherwise:

  • An HTTP cache may store the erroneous content and reuse it, showing users the error for longer than they otherwise would
  • Search engines will index the erroneous content, since they received no sign they should try again or discard the response as invalid
  • HTTP-level tools (debuggers, monitoring, curl, spiders, etc.) will report the response as successful, even though it wasn’t

Research/prior art

HTTP/0.9, /1.0

No mid-stream error signaling possible, meaning prematurely-terminated responses are indistinguishable from the normal request termination of closing the connection. This limitation presumably informed future requirements on bodies requiring either a content-length or transfer-encoding: chunked length indicators.

HTTP/1.1
IETF Draft: HTTP/1.1 Messaging §8 Handling Incomplete Messages
In theory, HTTP/1.1 provided a way to provide more error information via chunk extensions, but history produced no standard extensions and they were dropped for HTTP/2 and beyond.

If a chunked response doesn’t terminate with the zero-length end chunk, the client must assume that the response was incomplete — which at the very least, means a cache should double-check with the server before reusing the stored incomplete response. There are two ways to emit such an incomplete response:

  • Closing the TCP connection before any zero-length end chunk, which can be hard to convey to the user-agent since connection and associated information are assumed to be hop-by-hop. Additionally, this can have undesirable performance implications when proxying through gateways by tearing down warmed-up persistent connections, and it precludes adding HTTP-level debugging info in trailers, which seem the natural place to include it.
  • Writing invalid transfer-encoding framing, such as missing or incorrect hex-encoded chunk lengths. Middleboxes also understandably will truncate or attempt to repair such invalid responses, resulting in the user-agent running into the aforementioned problems.
HTTP/2
RFC 7540 §5.4.2 Stream Error Handling

An HTTP/2 stream can signal an application error by sending a RST_STREAM frame with an error code of 0x2 INTERNAL_ERROR… I think.

The following subsection §5.4.3. Connection Termination also suggests that premature closing of the TCP stream can signal an error, which is straightforward to translate from HTTP/1.1 but inherits the same issues.
SPDY

I would love to not have to think about SPDY at all, but many CDNs and similar gateways will transparently downgrade to SPDY for older user-agents. Luckily, SPDY’s semantics more or less map to HTTP/2’s: see IETF draft: SPDY Protocol §2.4.2. Stream error handling; but the hex code for INTERNAL_ERROR might be 0x6 instead

HTTP/3

HTTP/3 §8 Error Handling seems to leave exact implementation open for experimentation, which is good overall but makes it harder for me to understand a recommendation for my case. “H3_INTERNAL_ERROR (0x0102)” seems ideal, but the error happening somewhere “in the HTTP stack” makes me wonder if it’s suitable for application-level use?

Gateways translating from earlier versions of HTTP might reasonably choose to surface the previous signaling methods such as malformed chunks as either “H3_FRAME_ERROR (0x0106)” or “H3_MESSAGE_ERROR (0x010e)” — should either of those be used in that scenario? The mapping between h2/h3 errors seems mostly concerned with mapping transport-level semantics.
I’m having a hard time understanding how QUIC would convey the same error information as prematurely-closed TCP connections when translating that signal from earlier HTTP versions. It does mention “the QUIC transport could indicate to the application layer that the connection has terminated”, but “could” does not suggest I can rely on that behavior.

So what?

  • Persisting mid-stream application errors through various HTTP versions seems like something core HTTP semantics should allow for.

  • Guidance on how to signal mid-stream errors is hard to find, and I could only find guidance on translating those signals from HTTP/2 to HTTP/3. This is exacerbated by reverse proxies usually not bothering with supporting upstream connections higher than HTTP/1.1.

  • Existing methods to signal mid-stream errors can easily cause performance problems or unexpected behavior when attempting to convey them all the way to the requester.

  • While it’s theoretically possible to propagate 1 bit of error information (“is this response bad and shouldn’t be reused?”), other HTTP-level error data, such as retry-after, seem valuable to reuse.

@mnot
Copy link
Member

mnot commented Jul 7, 2021

This is out of scope for the HTTP core effort -- it would be considered a new feature.

However, see this draft and resulting list discussion. After a discussion at our last interim, it seems like there's interest in discussing this general area (not only for caching, but also other purposes, potentially), but it still needs "time to bake."

Probably the best way to move things forward is to participate in discussion on-list. Having more use cases fleshed out beyond caching will help to scope the work.

@mnot mnot closed this as completed Jul 7, 2021
@kazuho
Copy link

kazuho commented Jul 7, 2021

Regarding HTTP/3, the concerns and solutions were discussed in quicwg/base-drafts#3300.

@tigt
Copy link
Author

tigt commented Jul 17, 2021

Gotcha. I’ll probably post on the mailing list once I collect my thoughts with a more complete proposal, but for the moment I really want to know what I should do in an CTE stream to tell requesters that the response should be considered questionably-cacheable/damaged/non-authoritative/other stuff that 5XX errors get by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants