Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(fetch): support Content-Encoding response header #604

Merged
merged 14 commits into from
Oct 25, 2024

Conversation

mikicho
Copy link
Contributor

@mikicho mikicho commented Jul 12, 2024

Added failing test.

@kettanaito

  1. At first, I thought to adopt Nock's implementation (inspired by undici), but IIRC, interceptors is cross-platform and AFAIK zlib does not exist in the browser. How does the browser decompress the response body? Am I missing something?
  2. From what I see, IncomingMessage does not decompress the response body, and Node.js leaves this for the HTTP libraries (Axios, got, etc.). So, I think we should follow this behavior and leave the response body compressed.

Summary

  • We are not reusing Undici's decompression logic directly because that'd mean redistributing Undici alongside Interceptors.
  • We are using Compression Streams API to decompress GZIP and Deflate both in Node.js and the browser.
  • We are using zlib and a custom transform stream to support Brotli decompression in Node.js.
  • We are ignoring Brotli in the browser due to WASM issues.

@kettanaito
Copy link
Member

Thanks, @mikicho!

I believe the answers to both of your questions are in setting up an HTTP server and taking a look what happens to compressed bodies sent from the server. We can request the same endpoint in the browser and in Node.js, and see whether they handle that at all.

But overall, yes, Interceptors ships browser interceptors, so we can't run things like zlib in FetchInterceptor. The mental model behind the request listeners (and MSW handlers by extension) is to act as a server route handler. If you need to pull an extra lib to decompress the request body in real life, you need to do the same with Interceptors. Since we are not a convenience library, we don't need to worry about anything outside of that.

const compressed = zlib.brotliCompressSync(zlib.gzipSync(message))

interceptor.once('request', ({ controller }) => {
controller.respondWith(new Response(compressed, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mocked responses, you most definitely need a third-party. Interceptors mustn't meddle with your input based on any headers.

I'm mostly curious about receiving a compressed request in Node.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly curious about receiving a compressed request in Node.js.

I hope I understand your question correctly but:
Fetch: decompress the response according to the content encoding header.
IncomingMessage: return the body as-is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! That seems to be the case, although we may want to add integration tests for the ClientRequest interceptor to verify it. I do believe it's going to expose the response body as-is and it's up to the developer to decompress it.

The fetch client has decompression built-in, I can confirm it by looking at the source of Undici.

@kettanaito
Copy link
Member

Some related stuff I found:

@kettanaito
Copy link
Member

Gave this a try in Node.js:

import fs from 'fs'
import http from 'http'
import zlib from 'zlib'

const stream = fs.createReadStream('./payload.txt')
const server = http.createServer((req, res) => {
  // Compress the response body.
  if (req.method === 'GET' && req.url === '/resource') {
    res.statusCode = 200
    res.writeHead(200, {
      'content-type': 'text/plain',
      'content-encoding': 'gzip',
    })
    stream.pipe(zlib.createGzip()).pipe(res)
    return
  }

  res.statusCode = 404
  res.end()
})

server.listen(3000, async () => {
  console.log('Server is running on http://localhost:3000')

  const response = await fetch('http://localhost:3000/resource', {
    headers: { 'Accept-Encoding': 'gzip' },
  })
  console.log(await response.text())
})

I receive the file's content decoded. I don't do the compression though.

@mikicho
Copy link
Contributor Author

mikicho commented Jul 18, 2024

@kettanaito

I receive the file's content decoded. I don't do the compression though.

What do you mean? You do compress the response in the server

@mikicho
Copy link
Contributor Author

mikicho commented Jul 19, 2024

@kettanaito I added an unmocked test which passes.

@kettanaito
Copy link
Member

Thank you, @mikicho! Will look at this.

@kettanaito kettanaito force-pushed the Michael/support-fetch-content-encoding branch from ad6cf89 to 62b4f09 Compare July 21, 2024 17:03
@kettanaito
Copy link
Member

We do need to add Content-Encoding handling to the mocked responses in the fetch interceptor. Before proceeding, I've opened an issue in Undici to see if they would be willing to abstract their handling into a utility function we can consume directly: nodejs/undici#3412. That would be the best case scenario.

If not, we can integrate their handling into Interceptors manually, although that'd be a repetition I'd rather not introduce.

@kettanaito kettanaito changed the title fix(fetch): support content-encoding response header fix(fetch): support Content-Encoding response header Jul 21, 2024
@kettanaito
Copy link
Member

Update

I realized we still need to duplicate the Undici's decompression handling because it's unlikely to be backported to Node v18 and even v20, at this point. I certainly don't have the capacity to see those backports through.

The handling itself is rather straightforward, it's just nice to be consistent. We will afford that nicety once Node.js v22 becomes the minimal supported version across MSW's toolchain.

@mikicho, if you have time, can you give me a hand with this? I've already abstracted the decompression logic in the Unidic's PR, making it work better with ReadableStream:

https://github.com/nodejs/undici/pull/3423/files#diff-e0a4e92ad66667607d700021123bf26260711f87fda60b44f6867a3f22087410R856

We basically need this decompress and the surrounding utilities. We can also merge it all in a single function, there's no big value keeping it separated.

@kettanaito
Copy link
Member

We can also consider using the standard Compression Stream API instead of relying on Undici. I will explore this option.

@kettanaito
Copy link
Member

kettanaito commented Oct 16, 2024

Update

Opened a pull request that utilizes the Compression Streams API to decompress fetch responses: #661. This works for GZIP and Deflate, but has no Brotli support.

@mikicho, would be curious to hear your thoughts on this one.

Edit: Supporting Brotli seems like a lot of work. We can either release decoding without it (just GZIP and Deflate), and work on it in the background, or block this feature until we find a way to make Brotli happen in Node.js and the browser (it is possible via brotli-wasm, it just needs to be distributed properly).

@mikicho
Copy link
Contributor Author

mikicho commented Oct 16, 2024

@kettanaito I think we should check the environment, and if running in Node.js, use zlib, otherwise use a browser-compatible library.
WDYT?

@@ -29,6 +35,12 @@ const browserConfig: Options = {
format: ['cjs', 'esm'],
sourcemap: true,
dts: true,
esbuildOptions(options) {
options.alias = {
[`internal:brotli-decompress`]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the environment-based decompression for the build.

@@ -14,7 +14,10 @@
"types": ["@types/node"],
"baseUrl": ".",
"paths": {
"_http_common": ["./_http_common.d.ts"]
"_http_common": ["./_http_common.d.ts"],
"internal:brotli-decompress": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the environment-based decompression for the dev (TS).

alias: {
// Create a manual alias for Vitest so it could resolve this
// internal environment-dependent module in tests.
'internal:brotli-decompress':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the environment-based decompression for the tests.

Copy link
Member

@kettanaito kettanaito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mikicho, please, would you have a moment to browser through the changes? Would love to hear your thoughts too.

@mikicho
Copy link
Contributor Author

mikicho commented Oct 22, 2024

@kettanaito LGTM

Just curious about PipelineStream. I'm not sure why we need this.
Thanks for working on this!

@kettanaito
Copy link
Member

@mikicho, I added PipelineStream as an alternative for pipeline in Node.js but for web streams. We use it to compose a dynamic list of TransformStream (decompressors) into a chain of streams to apply to the response body.

Do you see a better approach here?

@mikicho
Copy link
Contributor Author

mikicho commented Oct 22, 2024

Ohh Ok! I see..
Not sure if I'd stick with Web API here (DecompressionStream) because AFAIU this code is used only by Node.js, but the current implementation is great too.

@kettanaito
Copy link
Member

It's good to be consistent across the browser and Node.js, and the DecompressionStream is available in both of those environments. We are also working with ReadableStream, which is the response body. It's easier to convert zlib to a web stream than convert everything else to a Node.js stream.

@mikicho
Copy link
Contributor Author

mikicho commented Oct 23, 2024

@kettanaito I think there is something with the decompression order.. still looking to understand the root cause (if any)

@kettanaito
Copy link
Member

Right now, we're applying decompression left-to-right based on the content-encoding header:

https://github.com/mswjs/interceptors/pull/604/files#diff-8828d872f58df5de9fe0ddeacea2d0e5f24dc843916c68a2643769555d838694R47

Maybe that's the problem?

Basically:

content-encoding: gzip, br

The decompression chain would be: input -> DecompressionStream('gzip') -> BrotliDecompression -> output.

@mikicho
Copy link
Contributor Author

mikicho commented Oct 23, 2024

This behavior looks correct; I'm not sure the code does it.

From what I see:

compressResponse(['gzip', 'br'])('hello world').toString('base64')); // H4sIAAAAAAAAA+NmbchIzcnJVyjPL8pJYQYAhavvgA8AAAA=

Produce a string that is br -> gzip, instead of gzip -> be

image

In contrary:

zlib.brotliCompressSync(zlib.gzipSync('hello world')).toString('base64'); // Cw+AH4sIAAAAAAAAA8tIzcnJVyjPL8pJAQCFEUoNCwAAAAM=

@kettanaito
Copy link
Member

kettanaito commented Oct 23, 2024

Your example is incorrect. compressResponse behaves similarly to how a server handles the Content-Encoding header. It lists all the encodings in the order they were applied, left-to-right. So, for your example, the order would be gzip, then brotli.

Decompression must be reversed, and there's a bug currently that I will fix soon.

DecompresionStream in Node.js v18-20

For some reason, the instances of DecompressionStream in Node.js v18-20 are printed as undefined when logging them out. They are still correct transform streams, just not observable in the console. Extremely confusing when debugging.

@mikicho
Copy link
Contributor Author

mikicho commented Oct 23, 2024

IIRC, Content-Encoding header is "in the order they were applied", so the string is first compressed to gzip, and only then compressed to br, so the decompression should go backward.

Here's a fun find

I saw that too lol. Seems like 20.18.0 (latest) works as expected

@mikicho
Copy link
Contributor Author

mikicho commented Oct 23, 2024

Your example is incorrect.

Why not? If I compressed by gzip, br, I expect to decompress by be -> gzip. Isn't it?

@kettanaito
Copy link
Member

kettanaito commented Oct 23, 2024

I think we mean different things by ->. I mean order, you seem to mean relation. Composition is easier:

Content-Encoding: gzip, br

br(gzip(data))

@mikicho, I've pushed the fix that applies the decompression right-to-left to respect the right order of multiple compressions. Mocked tests pass, but I have two bypass tests failing:

  • gzip, deflate
  • gzip, br

The error seems to be coming from Undici unable to handle that compression:

TypeError: terminated
 ❯ Fetch.emit ../node:events:518:28

Caused by: Error: incorrect header check
 ❯ Zlib.zlibOnError [as onerror] ../node:zlib:189:17

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { errno: -3, code: 'Z_DATA_ERROR' }

Our interceptor is not involved here at all. Am I constructing an invalid compressed response with compressResponse()? 🤔 I've fixed that function, too, to handle correct order of compression.

I can confirm that compressResponse(['gzip', 'deflate'], 'hello world') returns the same buffer as zlib.deflateSync(zlib.gzipSync('hello world')):

decompressResponse: eJyT7+ZgAAPh0x5nT54M1zivf8qTkaFV0IuXGygMAICXB8E=
zlib: eJyT7+ZgAAPh0x5nT54M1zivf8qTkaFV0IuXGygMAICXB8E=

@kettanaito
Copy link
Member

kettanaito commented Oct 23, 2024

I fixed the compose function by removing it. The compose order is correct.

The issue is in Undici, I can reproduce it standalone as well. Will report it to their repo.

Edit: Opened an issue at nodejs/undici#3762.

@kettanaito
Copy link
Member

Since the behavior is reproducible in isolation, our side of the implementation seems to be done. The intention of bypass tests is to ensure the interceptor doesn't interfere with the regular behavior of the request client, but in this case the request client has a bug so we cannot allow that to influence our tests.

I will skip those two scenarios and link the issue above them. Will do that after we discuss the issue with the Undici folks.

@kettanaito
Copy link
Member

This looks good to me. @mikicho, should we merge this?

@mikicho mikicho merged commit b8f8d51 into main Oct 25, 2024
2 checks passed
@mikicho mikicho deleted the Michael/support-fetch-content-encoding branch October 25, 2024 11:08
@kettanaito
Copy link
Member

🚀

@kettanaito
Copy link
Member

Released: v0.36.6 🎉

This has been released in v0.36.6!

Make sure to always update to the latest version (npm i @mswjs/interceptors@latest) to get the newest features and bug fixes.


Predictable release automation by @ossjs/release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GZIP doesn't work with got
2 participants