Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making concurrent HTTP/2 requests with fetch may fail with REFUSED_CONNECTION #21789

Open
magurotuna opened this issue Jan 4, 2024 · 15 comments

Comments

@magurotuna
Copy link
Member

Version: Deno 1.39.1

Problem

The following code reliably fails.

await Promise.all(
  Array(125)
    .fill(null)
    .map(() => fetch("https://example.com"))
);

Error message:

error: Uncaught (in promise) TypeError: error sending request for url (https://example.com/): http2 error: stream error received: refused stream before processing any application logic
await Promise.all(
^
    at async mainFetch (ext:deno_fetch/26_fetch.js:164:12)
    at async fetch (ext:deno_fetch/26_fetch.js:385:7)
    at async Promise.all (index 114)
    at async file:///private/tmp/fetch.ts:1:1

An interesting observation is that if we change the URL to another one (say https://fresh.deno.dev) it works with no issue even with higher concurrency.

// Succeed
await Promise.all(
  Array(250)
    .fill(null)
    .map(() => fetch("https://fresh.deno.dev"))
);

Investigation

I found and created a minimal reproducible example which is available at https://github.com/magurotuna/deno-fetch-h2-stream-error-repro.

In short, once the HTTP/2 connection is established, the client makes a lot of concurrent requests to the server before the server's initial SETTINGS frame reaches the client which should tell the client the maximum number of concurrent streams allowed. The requests over the limit are rejected with REFUSED_STREAM error.

Note that the client behavior of opening too many concurrent streams before receiving the initial SETTINGS frame from the server is not a violation according to the RFC. Both client and server are behaving correctly.

Solution

Some of the existing HTTP/2 client implementations do retry requests in case of REFUSED_STREAM error. We should probably do the same though at what layer we should do it is not necessarily clear.
The most obvious place would be in reqwest; I opened a PR in reqwest repo that adds the retry logic. seanmonstar/reqwest#2081

If this PR gets merged and shipped, we can upgrade reqwest version we're using in Deno and the issue should be fixed. I filed this issue to track the current status.

@mmastrac
Copy link
Contributor

mmastrac commented Jan 4, 2024

This may be related to #21749 as well.

@magurotuna
Copy link
Member Author

hyperium/h2#732 should also reduce the chance of us running into REFUSED_CONNECTION errors on startup.

@Xe
Copy link

Xe commented Jan 26, 2024

I just ran into this today while trying to make 35 connections at once with some naiive code:

const reqs = instances.map((x) =>
  fetch(
    `https://cdn.xeiaso.net/file/christine-static/${basePath}/index${i}.ts`,
    {
      headers: {
        "fly-force-instance": x,
      },
    },
  )
);

Is there a way to limit the number of requests this makes in-flight or am I just screwed?

@osddeitf
Copy link
Contributor

osddeitf commented Aug 10, 2024

v1.45.5 in WSL 2.

I tried fetching a URL with bytes stream (await fetch(...)).body.pipe(). It just randomly throws error, even retrying with fallback to (await fetch(...)).bytes() won't help.

The URL is HTTP/2 as well.

Below is the error logs I collected.

[DEBUG]: retry 2 TypeError: error reading a body from connection
    at async Object.pull (ext:deno_web/06_streams.js:937:27)
[DEBUG]: retry 2 TypeError: error reading a body from connection
    at async Object.pull (ext:deno_web/06_streams.js:937:27)
[DEBUG]: retry 2 TypeError: error reading a body from connection
    at async Object.pull (ext:deno_web/06_streams.js:937:27)
[DEBUG]: fallback not to use stream
error: Uncaught (in promise) TypeError: error reading a body from connection
    at async readableStreamCollectIntoUint8Array (ext:deno_web/06_streams.js:1065:19)
    at async consumeBody (ext:deno_fetch/22_body.js:254:9)
    at async DataFetcher.read (file:///home/mytool/zip2/fetcher.ts:177:27)
    at async file:///home/mytool/zip2/mod.ts:145:5

Edited. Also tried with v1.39.1 (it outputs more detailed).

[DEBUG]: retry 2 TypeError: request or response body error: error reading a body from connection: stream error received: unspecific protocol error detected
    at async Object.pull (ext:deno_web/06_streams.js:945:27)
[DEBUG]: retry 2 TypeError: request or response body error: error reading a body from connection: stream error received: unspecific protocol error detected
    at async Object.pull (ext:deno_web/06_streams.js:945:27)
[DEBUG]: retry 2 TypeError: request or response body error: error reading a body from connection: stream error received: unspecific protocol error detected
    at async Object.pull (ext:deno_web/06_streams.js:945:27)
[DEBUG]: fallback not to use stream
error: Uncaught (in promise) TypeError: res.bytes is not a function
    return res.bytes()

@magurotuna
Copy link
Member Author

@osddeitf it looks like that error message unspecific protocol error detected comes from the h2 crate which we internally use to implement HTTP/2 support in fetch.
https://github.com/hyperium/h2/blob/36cf4f2bf2ae77489ec499518c716ea09a977de5/src/frame/reason.rs#L66
So this error should have something to do with some protocol-level stuff. To look into it more, can you share a URL you made a request to? (or is that URL private?)

@osddeitf
Copy link
Contributor

@magurotuna Thanks for willing to help.

The URL I fetch is not something of top-secret anyway, I just want to read the content (filename, filesize) of a zip file:
https://autopatchhk.yuanshen.com/client_app/download/pc_zip/20240705190845_sEVV7ebRBuMrOheE/GenshinImpact_4.8.0.zip.006

I fetched a lot of small segment (<100 bytes). You can try it yourself by looping 1->50 and read the file with header like Range: bytes=100-200.

I tried to run the fetch logics in Nodejs and it never throws. So my workaround now is to run the fetch in a Nodejs child process, then get the read bytes from the main Deno process.

@osddeitf
Copy link
Contributor

osddeitf commented Aug 10, 2024

@magurotuna I tried to make a minimal reproduce code:

const offset = 2 ** 32 + 2**31
for (let i = 0; i < 100; i++) {
  const start= offset + i * 40
  const fet = () => fetch(url, {
    headers: {
      // Range: `bytes=${start}`
      Range: `bytes=${start}-${start+100000}`
    }
  })

  const res = await fet()
  if (!res.body) {
    throw new Error('no body')
  }
  try {
    const reader = res.body.getReader()
    const arr = await reader.read()
    console.log(i, arr.value?.byteLength)
  }
  catch (e) {
    console.log(e)
    console.log('Try again with bytes()')
    const res = await fet()
    const arr = await res.bytes()
    console.log(i, arr.byteLength)
  }
}
  • When the response body needs to be reasonably large (>= 100k), it would error.
  • Sometimes it hang, without error. (My other implementation did almost the same thing but it throws error every time)
  • Call res.bytes() instead of readable stream would work. But use it as try again doesn't work:
      4 16384
      5 16384
      TypeError: error reading a body from connection
          at async Object.pull (ext:deno_web/06_streams.js:937:27)
      Try again with bytes()
      error: Uncaught (in promise) TypeError: error reading a body from connection
          const arr = await res.bytes()
    

Edited. Tried release the resource manually:

await reader.cancel()
reader.releaseLock()

The loop would run more interations until it throws. I suspect that there's some sort of shared state (or internal buffer) between fetch requests. And I think it shouldn't be normal to have some state shared.

At least we should be able to implement retry logics (like I did above), but it's not work. Why 2 different requests failed the same though?

@magurotuna
Copy link
Member Author

@osddeitf Thanks for the code snippet. I ran that on my local machine several times and it always hung at around i = 60, and I never saw any errors being thrown. Is this the same as what you experienced?

I also tried the manual cancellation pattern (i.e. inserting await reader.cancel() after await reader.read()). I executed this several times too, all of which were successful. So just to confirm, what you were seeing is that it still failed (either error thrown or hang?) after inserting await reader.cancel() - is this correct?

@osddeitf
Copy link
Contributor

@magurotuna It usually throws when each read chunk size is 16k, if 8k then it hangs instead.

With body size 100k bytes, it throws around 60 without manual cancelation.
You can try increase the body size to 1_000_000 bytes, it would throw/hang at iteration 5, and if added manual cancelation, it throws around 40-50.
It may also depend on CPU / network condition, so you may need to increase the loop iterations.

@magurotuna
Copy link
Member Author

Thanks. I tried with bigger body size (1MB) and more iterations (500), but wasn't able to reproduce throwing errors. It always hangs for me. I guess it's maybe because of my network speed which is fast (over 1Gpbs)?

Anyways, here are what I've found so far which I think may be helpful for you:

1. Disable HTTP/2

You can disable HTTP/2 and enforce HTTP1 like below if you want. Note that you'd need --unstable-http flag when running this.

const client = Deno.createHttpClient({
  http2: false,
  http1: true,
});

for (let i = 0; i < 100; i++) {
  const res = await fetch("https://example.com", { client });

  // work with res...
}

In my environment, this trick makes the entire execution succeed always with no hang.

(Note that fetch in Node.js uses HTTP1 by default unless a dedicated option is explicitly passed)

2. Reused connection

By default Deno's fetch reuses the connection if the destination server is the same, so when you have a loop where you call fetch(url), Deno keeps the connection throughout the loop and reuses it. This can be disabled by creating client inside the loop, i.e.

for (let i = 0; i < 100; i++) {
  const client = Deno.createHttpClient({});
  const res = await fetch("https://example.com", { client });

  // work with res...
}

This creates a new connection in each iteration, which also resulted in the successful execution in my environment without disabling HTTP/2, in exchange of paying a little bit of an extra cost for establishing a new connection every time.

3. Why did manual cancellation alleviate the issue

HTTP/2 has its own flow control mechanism. After you have read some amount of received data from the reader, the flow control mechanism tells the server that "You can send me another XXX bytes" (this kind of message is called "WINDOW_UPDATE" where window refers to the size that the remote peer can send to me).
Let's say you don't call await reader.cancel(). In this case, the HTTP/2 stack will keep holding the received data because it is unsure whether received data will be used in the future. This results in WINDOW_UPDATE not sent to the server since it needs to buffer the received data. So at some point, the window size is exhausted - at which point the server stops sending more data. This is why it hangs.

By properly calling await reader.cancel(), the HTTP/2 stack can be sure that the data that was received but not yet read out can be discarded. It then drops the received data and sends WINDOW_UPDATE to the server, letting it send more data.

@osddeitf
Copy link
Contributor

Thank you, I'll try the approach you mentioned. I fetch a lot of small segments so I'll use HTTP/1.

By properly calling await reader.cancel(), the HTTP/2 stack can be sure that the data that was received but not yet read out can be discarded. It then drops the received data and sends WINDOW_UPDATE to the server, letting it send more data.

It's not working correctly, so I think there should be some bug there.
I'll also try it in Nodejs with HTTP/2 to see.

@raaymax
Copy link

raaymax commented Aug 18, 2024

I have the same issue. When trying to stream multiple files from GCS, the fetch freezes without an error, and I can't make any new requests as they freeze too. Downgrading to HTTP/1 solved the issue. Either way, I lost a good few hours on this.

deno: v1.45.5

@magurotuna
Copy link
Member Author

@raaymax Thanks for the report. It would be really helpful if you could prepare as minimal an example as possible that can reproduce the issue you ran into and share that with us.

@raaymax
Copy link

raaymax commented Aug 18, 2024

No matter what, I can't reproduce it in tests, and I really tried. But when I run my app, it just fails after a few moments of use. I have no idea why this is happening or what's causing it, but switching to HTTP/1 for server-to-GCS communication works for me. I'll let you know if I find any more useful information.

update: I finally found the root cause. It was just some response body stream that wasn't consumed in a specific scenario. The number of these unclosed streams was accumulating over time, and eventually, every new fetch stopped working. Interestingly, when using HTTP/1, fetch wasn't blocked at all.

@magurotuna
Copy link
Member Author

Thanks for the additional info. How does your code work with the response body stream? Like I mentioned in #21789 (comment), one of the typical scenarios that can lead to hanging is that the response stream is neither consumed nor canceled, which causes HTTP/2 flow control mechanism to stop further data transmission once the window is used up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants