Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestOptions payloads for http/https request generated from url.parse end up having headers removed from the request #188

Closed
dg-harris opened this issue Nov 16, 2021 · 16 comments

Comments

@dg-harris
Copy link

If a RequestOptions payload is created using the node url.parse function, the normalizeClientRequestArgs function ends up stripping the request of additional configuration such as headers.

I've created a test to demonstrate the issue at: https://github.com/dg-harris/interceptors/pull/1/files

Here is an example of a popular library using this pattern: https://github.com/EventSource/eventsource/blob/master/lib/eventsource.js#L84-L139

Here is a repo with a small example of the msw library preventing authentication of the launchdarkly-node-server-sdk due to this issue:
https://github.com/dg-harris/ld-msw-authentication-issue

Please let me know if there is any additional information I can provide or work I can do to help.

@kettanaito
Copy link
Member

Hey, @dg-harris. Thanks for reporting this!

We've finished a full rewrite of ClientRequest interception recently (#164), which included fixes for a few issues. I think what you've described has been covered. I recall there was a bug that we disregarded certain request options during normalizeClientRequestArgs, which led to those options being lost.

Could you try cloning the current state of the repo and adding that reproduction test straight to the respective test suite? If it fails, I'd be thankful for a pull request with that failing test.

@dg-harris
Copy link
Author

Can do! Thanks for the quick response.

@gerbyzation
Copy link

gerbyzation commented Dec 10, 2021

I ran into this issue as well, this still occurs in the rewritten version (@mswjs/interceptors@0.13.0) as well.
Another repro: https://github.com/gerbyzation/msw-launchdarkly-401-repro

The value of legacyUrl:
Screenshot 2021-12-09 at 12 18 19

Is this something MSW could gracefully handle despite the fact that strictly speaking they are passing down values that do not belong in request options?

I also believe launchdarkly uses their own evensource library instead: https://github.com/launchdarkly/js-eventsource

@eli-darkly
Copy link

Hi - I'm one of the maintainers of the LaunchDarkly fork of that library, and this was just brought to my attention. I'm a bit confused by this statement from @gerbyzation:

strictly speaking they are passing down values that do not belong in request options

I'm not sure why it would not be legit to parse a URL with url.parse and then add some additional options, and pass the resulting object as a single parameter to http.request, as we're doing. The parts of a parsed URL are defined as valid properties in that options object, as shown here. The request function is defined as taking either url, options or just options, and the latter option would make no sense if it weren't possible to include the URL properties in the options.

@eli-darkly
Copy link

But I'm also not 100% sure why we've been doing this. My guess would be that it was came from the very old upstream code, and that possibly the (url, options) semantics were not supported in some old version of Node— but the Node API docs don't say "added in such-and-such version" there.

In any case at this point, we could probably switch to using (url, options) instead of parsing the URL, and I don't have any objection to making such a change. I'm just confused by the assertion that the current way is wrong— and if in fact that is still a legitimate way to use http.request, which other people might also be doing, then it does seem like something MSW ought to be able to handle correctly.

@eli-darkly
Copy link

eli-darkly commented Dec 30, 2021

If I'm right that the relevant logic is in https://github.com/mswjs/interceptors/blob/main/src/interceptors/ClientRequest/utils/normalizeClientRequestArgs.ts ... then it does look to me like that logic is making a wrong assumption. It simply isn't the case that if http.request or https.request is called with a single argument, then that argument has to be a URL. That isn't what the Node API says: the two supported signatures are http.request(options[, callback]) and http.request(url[, options][, callback]), so a single argument could either be "options" (with no callback) or "url" (with no options or callback). The url parameter can be either a string or a parsed URL. And the TypeScript definition of RequestOptions agrees with the online API docs that the options can include URL components.

It seems to me that the right logic would be: if there's a single argument and it's a string, then it's a URL. If there's a single argument and it's an object, then you should treat it as options no matter what... because if it's a parsed URL, the properties of a parsed URL correspond exactly to properties in RequestOptions. I'm sure that's by deliberate design. I don't see why there would need to be separate paths for "this object has URL properties in it" and "this object has options that may include URL properties in it". The one is just a subset of the other.

@eli-darkly
Copy link

Again, it's probably not hard for us to switch to passing two parameters, so I'm not ruling out releasing a patch of our library if that will help. It's just that (unless I'm missing something) this does look like it might be a bug in MSW, that it is incorrectly handling an allowable usage, in which case it may affect other people's code too and not just LaunchDarkly.

@gerbyzation
Copy link

gerbyzation commented Dec 30, 2021

@eli-darkly Thanks for taking a look!

I'm a bit confused by this statement from @gerbyzation:

strictly speaking they are passing down values that do not belong in request options

I'm not sure why it would not be legit to parse a URL with url.parse and then add some additional options, and pass the resulting object as a single parameter to http.request, as we're doing.

Could've worded this better, I meant this in terms of the shape of the requests object, as the parsed URL object is richer in that it includes properties not part of the type for request options. If this is a good idea (or can be expected to be a wider pattern) I think is the question here to inform if this should be fixed in MSW somehow or in the LD SDK

The parts of a parsed URL are defined as valid properties in that options object, as shown here.
because if it's a parsed URL, the properties of a parsed URL correspond exactly to properties in RequestOptions

Partly, but not exactly as it also includes additional properties like hash and 'href' which are not part of the definition of request options: https://nodejs.org/api/url.html

This is the function that trips up:

/**
* Normalizes parameters given to a `http.request` call
* so it always has a `URL` and `RequestOptions`.
*/
export function normalizeClientRequestArgs(
defaultProtocol: string,
...args: ClientRequestArgs
): NormalizedClientRequestArgs {
let url: URL
let options: ResolvedRequestOptions
let callback: HttpRequestCallback | undefined
log('arguments', args)
log('using default protocol:', defaultProtocol)
// Convert a url string into a URL instance
// and derive request options from it.
if (typeof args[0] === 'string') {
log('first argument is a location string:', args[0])
url = new URL(args[0])
log('created a url:', url)
const requestOptionsFromUrl = getRequestOptionsByUrl(url)
log('request options from url:', requestOptionsFromUrl)
options = resolveRequestOptions(args, url)
log('resolved request options:', options)
callback = resolveCallback(args)
}
// Handle a given URL instance as-is
// and derive request options from it.
else if (args[0] instanceof URL) {
url = args[0]
log('first argument is a URL:', url)
options = resolveRequestOptions(args, url)
log('derived request options:', options)
callback = resolveCallback(args)
}
// Handle a legacy URL instance and re-normalize from either a RequestOptions object
// or a WHATWG URL.
else if ('hash' in args[0] && !('method' in args[0])) {
const [legacyUrl] = args
log('first argument is a legacy URL:', legacyUrl)
if (legacyUrl.hostname === null) {
/**
* We are dealing with a relative url, so use the path as an "option" and
* merge in any existing options, giving priority to exising options -- i.e. a path in any
* existing options will take precedence over the one contained in the url. This is consistent
* with the behaviour in ClientRequest.
* @see https://github.com/nodejs/node/blob/d84f1312915fe45fe0febe888db692c74894c382/lib/_http_client.js#L122
*/
log('given legacy URL is relative (no hostname)')
return isObject(args[1])
? normalizeClientRequestArgs(
defaultProtocol,
{ path: legacyUrl.path, ...args[1] },
args[2]
)
: normalizeClientRequestArgs(
defaultProtocol,
{ path: legacyUrl.path },
args[1] as HttpRequestCallback
)
}
log('given legacy url is absolute')
// We are dealing with an absolute URL, so convert to WHATWG and try again.
const resolvedUrl = new URL(legacyUrl.href)
return args[1] === undefined
? normalizeClientRequestArgs(defaultProtocol, resolvedUrl)
: typeof args[1] === 'function'
? normalizeClientRequestArgs(defaultProtocol, resolvedUrl, args[1])
: normalizeClientRequestArgs(
defaultProtocol,
resolvedUrl,
args[1],
args[2]
)
}
// Handle a given "RequestOptions" object as-is
// and derive the URL instance from it.
else if (isObject(args[0])) {
options = args[0]
log('first argument is RequestOptions:', options)
// When handling a "RequestOptions" object without an explicit "protocol",
// infer the protocol from the request issuing module (http/https).
options.protocol = options.protocol || defaultProtocol
log('normalized request options:', options)
url = getUrlByRequestOptions(options)
log('created a URL from RequestOptions:', url.href)
callback = resolveCallback(args)
} else {
throw new Error(
`Failed to construct ClientRequest with these parameters: ${args}`
)
}
options.protocol = options.protocol || url.protocol
options.method = options.method || 'GET'
/**
* Infer a fallback agent from the URL protocol.
* The interception is done on the "ClientRequest" level ("NodeClientRequest")
* and it may miss the correct agent. Always align the agent
* with the URL protocol, if not provided.
*
* @note Respect the "agent: false" value.
*/
if (typeof options.agent === 'undefined') {
const agent =
options.protocol === 'https:'
? new HttpsAgent({
rejectUnauthorized: options.rejectUnauthorized,
})
: new HttpAgent()
options.agent = agent
log('resolved fallback agent:', agent)
}
log('successfully resolved url:', url.href)
log('successfully resolved options:', options)
return [url, options, callback]
}

Line 114 specifically is where this issue is taking place, because the hash property is present on the request options object it thinks it's a url instead.

else if ('hash' in args[0] && !('method' in args[0])) {

If I'm right that the relevant logic is in https://github.com/mswjs/interceptors/blob/main/src/interceptors/ClientRequest/utils/normalizeClientRequestArgs.ts ... then it does look to me like that logic is making a wrong assumption. It simply isn't the case that if http.request or https.request is called with a single argument, then that argument has to be a URL

If the hash property would not be defined on the request options it would continue to correctly interpret the request options and extract the URL from it, so it's not making the assumption that if there is only one argument this has to be a URL. See this part of the function:

// Handle a given "RequestOptions" object as-is
// and derive the URL instance from it.
else if (isObject(args[0])) {
options = args[0]
log('first argument is RequestOptions:', options)
// When handling a "RequestOptions" object without an explicit "protocol",
// infer the protocol from the request issuing module (http/https).
options.protocol = options.protocol || defaultProtocol
log('normalized request options:', options)
url = getUrlByRequestOptions(options)
log('created a URL from RequestOptions:', url.href)
callback = resolveCallback(args)
} else {
throw new Error(
`Failed to construct ClientRequest with these parameters: ${args}`
)
}

So to summarise the presence of the additional hash property on the request options causes it to be sniffed as a url instead. Not sure yet if this is something that should not be passed as an argument or if msw needs to change how it sniffs the argument types, hoping we can figure that out together

@eli-darkly
Copy link

@gerbyzation Aha, I missed that, thanks. I didn't look down far enough. Yes, that's definitely something we should fix on our end and it looks to me like MSW's logic is reasonable.

@gerbyzation
Copy link

@eli-darkly thanks for working on this! Gave 6.2.2 a spin today and it's looking good!

@dg-harris
Copy link
Author

@eli-darkly 6.2.2 worked for me as well, thanks for looking into this!

@kettanaito
Copy link
Member

Thank you for the cooperation, @eli-darkly!

I've looked into one other example from eventsource and I can conclude that this is a wrong way to compose RequestOptions. Basically, the intention here is to take some parts of the parsed URL and derive a partial RequestOptions out of it. But that's not what ends up happening. Due to the lack of typization in JavaScript, the entire URL object ends up in the options:

{
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'mswjs.io',
  port: null,
  hostname: 'mswjs.io',
  hash: null,
  search: null,
  query: null,
  pathname: '/resource',
  path: '/resource',
  href: 'https://mswjs.io/resource',
  headers: { 'Content-Type': 'text/plain' }
}

These kind of options fall under no allowed call signature of ClientRequest per Node.js documentation. The final options eventsource constructs is neither URL not RequestOptions:

  • It has extraneous keys like hash, query, href, and other to be a valid RequestOptions object; It also misses required properties like method.
  • It has additional keys like headers to be a valid URL instance.

I understand anybody can construct any object as the input to ClientRequest in practice since the implementation will only take the necessary properties by reference. That does not mean that doing so is okay. This library can't and won't support any possible ways people can scrutinize an already variable call signature of ClientRequest.

Instead, eventsource should do this:

const parsedUrl = parse(url)
const options = {
  protocol: parsedUrl.protocol,
  pathname: parsedUrl.pathname,
  // ...manually map the properties expected by "RequestOptions",

  // Also add headers.
  headers: { 'Cache-Control': 'no-cache', 'Accept': 'text/event-stream' }
}

req = (isSecure ? https : http).request(parsedUrl, options, function (res) {

That would be the correct way to call http.request.

@eli-darkly
Copy link

@kettanaito, thanks but I can't tell if there is any new information in your last comment - it looked to me like @gerbyzation already described the issue and answered my earlier question, clarifying why the presence of properties like hash is a problem, and I already made a fix in the fork that I maintain which they were referring to. I don't have any involvement in the original eventsource project.

@kettanaito
Copy link
Member

@eli-darkly, understood. My point above was that we cannot account for all possible ways people may construct ClientRequest on runtime. You may even provide invalid options and Node.js API may still consume them (to a questionable extent). The way eventsource composes RequestOptions is against the Node.js documentation—that's my reasoning for not covering this issue. That also doesn't appear to be a legitimate way to use http.request. The fact that it works doesn't necessarily mean it works correctly/as intended.

The fix on eventsource side is more than trivial. I urge anybody blocked by this to open a pull request to that library, linking this discussion. When they compose the request options correctly, MSW will have no issues normalizing the ClientRequest arguments.

@eli-darkly
Copy link

@kettanaito Yes, I'm saying that we already agreed on this point. No one was arguing any more that that was the right way to use http.request. And as far as I know, none of the people involved in this conversation were either using or maintaining eventsource; they were using the fork that I maintain, which has been fixed, although someone mistakenly linked to the upstream project at some point. I don't have bandwidth to submit a PR for that project (the fork has diverged so much that it's basically an independent project at this point) but I will file an issue there.

@kettanaito
Copy link
Member

@eli-darkly, my apologies, I've confused the projects. Thank you for explaining the latest state, I'm glad that people can use both projects without issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants