-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORB: blocking of nosniff and 206 responses #686
Conversation
Note: this pull request is related to issue #681 (which track overall discussions about Cross-Origin Read Blocking (CORB). |
I think link rel=preload as=image will have different onerror vs. onload behavior depending on empty body vs. network error. So that'd be observable. Probably not in a breaking way though, just in a way that we should write tests for. |
Thanks! I have a couple questions.
|
Before sending out this PR I talked with @nick-chromium and we think it should be fine to change the implementation to inject a network error when blocking images/audio/video/fonts (but keep filtering the response for other cases). But, we thought (and I still do) that this is unnecessary because the difference won't be observable. Thanks @domenic for pointing out the link rel=preload as=image case - I'll try adding WPT tests for this. I do note that Blink raises link.onerror event for both load and decode errors (see blink::Resource::ErrorOccurred and blink::LinkLoader::NotifyFinished), so there still seems to be no observable difference between net errors and decode (e.g. empty body) errors. FWIW, I am still trying to play with link/preload/image case here: https://crrev.com/c/984338. This PR deals with images (and audio/video/fonts/ Anyway - if we need to anyhow introduce a concept of a CORB-filtered-response in the long-term (e.g. when specifying how to handle/block 206 responses for XHR/fetch), then maybe we should use this concept consistently, rather than have the spec and the implementation inject network errors in some cases and filter response headers+body in other cases. QUESTION: What are your thoughts on introducing the concept of a CORB-filtered-response into the Fetch spec? If we went down that route, then would we need a separate nosniff-for-CORB section (separate from the existing section here and with a separate reference/step/item in the main fetch algorithm)? It seems kind of unfortunate that the list of headers on CORS-filtered-response is different from the CORB-filtered-response (the reason is benign - quality of some CORS-related error messages suffered when CORB filtered out CORS-related response headers). I am not sure what is the best course of action here. Let me try to play with WPT tests and maybe this will help me get more familiar with potential network-error VS filtered-response differences. So far we don't know of a way to observe net-error-VS-empty-body difference for images, so maybe we should proceed with the current PR proposal. WDYT?
I am not sure if I understood the question, but let me try to reply below. Blocking just based on the Blocking based on the Also note that I understand that the changes in this PR don't cover all aspects of CORB (e.g. handling of 206 and/or blocking based on sniffing for JSON security prefix / parser breaker). I thought that I can tackle spec changes in small steps and can leave 206+sniffing for later (nosniff is by far the biggest class of scenarios with observable CORB impact in the wild). I am also hesitant to dive into specifying CORB sniffing - I was hoping to avoid having a formal spec for this (sniffing is not needed to define CORB behavior for 206 and nosniff; sniffing can be seen as an implementation detail - if an implementation convinces itself that a JSON security prefix never appears in images or media then it can block without any observable impact).
@annevk, can you please clarify which specs require track fetches to be strict wrt the Content-Type response header? I tried looking at https://html.spec.whatwg.org/multipage/media.html#sourcing-out-of-band-text-tracks and I notice that it says:
Is this the part of the spec that you meant? (asking mainly so that I can refer to the correct spec in CORB explainer) FWIW, I tried looking at wpt/html/semantics/embedded-content/media-elements/track/track-element/, but I couldn't identify tests that check the Content-Type here. Therefore I am not sure how to easily verify Chromium's behavior :-/. Also - not sure if it is worth opening a bug (somewhere?) to add WPT test coverage here.
Thanks for reminding me about this issue. I think the we should make sure that Chromium's CORB implementation and the spec definitions of HTML / XML / JSON MIME types should agree. I probably should put together a Chromium CL that 1) changes Chromium implementation, 2) adds WPT tests for CORB-ing or not-CORB-ing specific MIME types. The Chromium CL would have to be reviewed by other security folks (e.g. xtofian@) who were advising us on what MIME types to cover (I think they would be okay with the changes, but I'd have to double-check about QUESTION: What are your thoughts on WPT coverage of MIME type definitions? Do you think I can/should cover more MIME types by tests similar to wpt/fetch/corb/img-png-mislabeled-as-html-nosniff.tentative.sub.html? I could switch from using hardcoded response headers for wpt/fetch/corb/resources/png-mislabeled-as-html-nosniff.png to having something parametrized via URL / php. OTOH, this particular test seems a bit icky to me - CORB's effects are only indirectly visible and force test verification to be done via reference page/image comparison... |
If we need CORB-filtered responses and other implementers are on board with them we should add them. It seems to me those would not expose any headers, but perhaps I'm missing something? Once we add them we'd need to do nosniff differently, indeed, but I think the more we can handle as a network error the better, so what we say is a network error in this PR ideally remains to be so over time. (And doing this all incrementally seems like a very good approach to me, especially as a change like this can be isolated from a much more invasive change such as CORB-filtered responses or sniffing.)
I don't agree with that line of thinking. Applications might come to rely on this happening and then other browsers have a vulnerability. I didn't realize that the
and not the big box below. I guess we'd have to look at implementations. Filing a bug against WPT or HTML seems like the way to go. 😟 As for test coverage, the more the better. If you cannot obtain something through script, http://web-platform-tests.org/writing-tests/reftests.html would be the way to go. That should still be relatively fast. |
Ack. I'll put that on a backburner for now and I'll try to focus on the current, incremental change that focues on nosniff-related behavior. Regarding the question - CORB-filtered responses do include some headers - see the "How does CORB “block” a response" section in the explainer.
Yes - I very much agree with this.
I am trying to 1) restrict which MIME types are CORB-protected and 2) add WPT tests for this in https://crrev.com/c/985211. Please chime in if you have any feedback (e.g. I am slightly tweaking and reusing nosniff/image.py). One thing I've realized after working on this CL is that my initial PR was too broad - CORB is limited to cross-origin, non-CORS-allowed responses (unlike other nosniff directives in the section I am changing). I've pushed another revision of the PR which more closely reflects reality, but is unfortunately slightly more complex and inconsistent with the other nosniff directives. This seems unfortunate, but may be still be the right way to proceed. WDYT?
I still think that the difference between net-error-VS-empty-body is not observable for images, media, etc. Nevertheless, we've talked about this earlier today and we'll try to see what breaks if we try to change the CORB implementation to inject a net-error (either for a subset or for all of responses). For now, I've opened https://crbug.com/827633 to track this work. |
If an implementation would not put it there, but have all CORB-related checks together, I'd be more inclined to match that, probably. Even if we create that algorithm incrementally. That'd also more directly tie it to "no-cors".
Reading https://html.spec.whatwg.org/multipage/semantics.html#obtaining-a-resource-from-a-link-element which @domenic pointed out strongly suggests that it distinguishes between 200/network error for |
I think it's better if all "opaque" responses are CORB-checked, and CORB shouldn't depend on any information in the request. If request A can bring content from response B into evil.com's content process, then there's no point blocking any other kind of request from doing the same. As a bonus, if we're blocking based on response only, then we don't end up with problems around opaque response objects or the cache API. The sensitive information will be removed before it's written to the cache, or before the response object is created. |
I've been busy with other stuff, but I just wanted to drop a note saying that I agree with the feedback to 1) have all CORB-related checks together and 2) base the decision only on the response (e.g. on CORS-related headers present in the response, rather than on the cors-mode of the request). I hope to get back to this work next week - my plan is to update the PR so that CORB stuff 1) is put into a new, separate section and 2) results in filtering the response body and headers (rather than in a net error). |
I've made some assumptions about the direction of CORB in #560 (comment). Let me know if this makes sense. |
Thanks @anforowicz. I'm catching up on this myself, and it seems like the main remaining area of discussion is around net errors vs empty responses. In terms of whether it's observable for link preload, the difference seems to be between the spec language that @annevk pointed to and the Blink implementation that @anforowicz pointed to. The spec seems to say link rel=preload as=image should fire a load event and not an error event if an empty (i.e., non-decodable) image response is received. Blink seems to fire an error event in that case, based on local testing. Is this a bug in Blink, and do other browsers behave differently? I think Blink's behavior here may have been part of the reason @nick-chromium closed https://crbug.com/827633 as WontFix. That left our CORB implementation as an empty response since the effort to change to net error didn't seem worth it if it wasn't observable. Nick did have a CL to change to net errors, but it posed a risk that preloaded responses would not be fully loaded into the cache (using Chrome's DetachableResourceHandler) as they should be. It also seemed to break our console messages for blocked responses, but that might be fixable. Just to clarify, if we switched to net errors, we would still expect preload to load the entire response in the browser's network cache, correct? I assume that would matter in cases where one page preloads URLs that the next (possibly cross-site) page will need to use-- those URLs should be cached for the next page even if they end up being opaque to the current page.
Yes, we started with empty responses in our initial prototypes (years ago) because they seemed less likely to generate observable behavior and because net errors caused more layout tests to fail, but those may have been mistaken assumptions and implementation issues, based on the discussions here. Let's (1) determine if Blink needs to change its link preload behavior, and (2) decide if there are reasons to switch to net errors. |
Either way, I'll make #560 (comment) match whatever's decided. |
Okay - I made some progress here and pushed out a new iteration of the PR. It is still a little bit rough around the edges, but I hope that we all agree that the changes are going in the right direction. Notable open issues:
|
Since this only needs to apply to "no-cors", it seems changing the bit of step 5 of https://fetch.spec.whatwg.org/#concept-main-fetch that deals with "no-cors" would be more straightforward than duplicating same-origin and CORS checks. Although maybe you care about the ordering relative to CSP? In that event it seems that using request's response tainting would be easier? What is the observable difference between an empty response and a response with some headers filtered and its body omitted? Is that mostly about Content-Type and X-Content-Type-Options or some such? And without those it would trigger an error event here and there? It would be nice if could be more limited than allowing quite a lot of headers. |
(I think the other thing we need is some description about what this means for implementation architectures to properly defend themselves. Maybe that can be done separately somehow though.) |
I think we should avoid referencing the request here, else it suggests a different kind of request could bring the data into the same process. I believe checking the response is opaque is enough (@annevk can confirm).
We detect redirects using status code, so that should be good enough unless there are problems in the wild.
We would react with a network error, except a network error leaks too much about the other origin. Therefore I think a filtered response is risky vs an empty response. |
@jakearchibald the only reliance on request is to filter out navigation and downloads, which seems accurate. (Though really need to research "embed" and "object" at some point as there are some issues lurking there.) Also, I wasn't asking about a network error. I was asking about what's in the PR versus a completely empty response. |
Ah, then I agree. I went for an a new response in https://github.com/whatwg/fetch/pull/560/files#discussion_r180417935 and I think the same should be done here. |
Good idea. I've removed CORS and same-origin checks from the CORB section. I think only the last clause (the "otherwise" clause) in step 5 needs to be adjusted for CORB. I think there is no need to change the "no-cors" clause, because "opaque filtered response" response already has a null body, so CORB cannot really add much here, right?
Filtering out CORS headers leads to bad error messages in CONSOLE ERROR: line 1: Failed to load http://127.0.0.1:8000/xmlhttprequest/resources/access-control-allow-lists.php?origin=http%3A%2F%2Fwww2.localhost%3A8000: The 'Access-Control-Allow-Origin' header has a value 'http://www2.localhost:8000' that is not equal to the supplied origin. Origin 'http://localhost:8000' is therefore not allowed access. With CORB: CONSOLE ERROR: line 1: Failed to load http://127.0.0.1:8000/xmlhttprequest/resources/access-control-allow-lists.php?origin=http%3A%2F%2Fwww2.localhost%3A8000: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8000' is therefore not allowed access.
I think that choosing to implement CORB in a renderer process (VS in a browser/supervisor
Okay - I've switched to using the 206 status code. |
Those error messages though only reach the console, they are not exposed to web content, so I don't think we need to care for them here. As for "opaque-filtered response", part of the point there is that some features, e.g., |
@anforowicz can you explain why the "Otherwise" clause applies? That's CORS. I thought CORS was excluded? Unless I'm mistaken this should only apply to cross-origin "no-cors" loads, which is covered by "opaque" response tainting. And yeah, I think when CORB applies you basically want to return a fresh empty response that doesn't even have an internal response to hold onto or inspect. |
* MIME types are without quotes * No need to talk about current url's origin anymore.
I pushed some further modifications. I'm not a 100% sure this makes sense as an "HTTP extensions" subsection, but since it uses the X-Content-Type-Options header I guess it's okay. As for tests, I'll have a look. Also, would you like to add your name to the Acknowledgments section? |
As per https://github.com/whatwg/meta/blob/master/MAINTAINERS.md#handling-pull-requests browser bugs will also need to be filed. |
Done (I think):
|
Tentative commit message:
Ideally @jakearchibald would do a final review. If he can't do due to I/O maybe @yutakahirano can help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor stuff. This is looking good.
fetch.bs
Outdated
|
||
<p class="note">Cross-origin read blocking, better known as CORB, is an algorithm by which dubious | ||
cross-origin resource fetches are identified and blocked before they reach a web page. CORB reduces | ||
the risk of leaking sensitive data by keeping it further from cross-origin web pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocked before
remove double space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on "dubious" here, but I don't have a better idea.
Is it fair to say we're "blocking fetches that would fail anyway, but blocking them earlier to reduce the risk of leaking sensitive data…"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocked before
remove double space.
Done.
Is it fair to say we're "blocking fetches that would fail anyway, but blocking them earlier to reduce the risk of leaking sensitive data…"
I've tried to incorporate the suggestion into the current wording.
fetch.bs
Outdated
{{XMLHttpRequest}}), not observable (e.g., in case of pings or CSP reports which ignore the | ||
response), or would result in an error (e.g., when failing to decode an HTML document embedded in an | ||
<code>img</code> tag as an image). This means that CORB can block <a>CORB-protected MIME types</a> | ||
resources without being disruptive to web pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear to me that this note was talking about what happens aside from CORB.
Maybe start "Even without CORB…"?
Accessing cross-origin resources
Maybe "Accessing the content of cross-origin resources"? Since we allow cross origin resources for imgs, script CSS.
fetch()
can fetch no-cors
so it might not be a good example here.
It isn't clear that this is presenting a list of things, so maybe: "…is managed either by the CORS protocol…".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear to me that this note was talking about what happens aside from CORB.
Maybe start "Even without CORB…"?
Done
Accessing cross-origin resources
Maybe "Accessing the content of cross-origin resources"? Since we allow cross origin resources for imgs, script CSS.
Done.
fetch() can fetch no-cors so it might not be a good example here.
Good point - I've removed the fetch() example and only left XHR (which AFAIK doesn't have an equivalent of no-cors mode).
It isn't clear that this is presenting a list of things, so maybe: "…is managed either by the CORS protocol…".
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, XHR can't fetch no-cors.
fetch.bs
Outdated
<ol> | ||
<li><p>If <var>request</var>'s <a for=request>initiator</a> is "<code>download</code>", then return | ||
<b>allowed</b>. | ||
<!-- XXX If we recast downloading as navigation this step can be removed. --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annevk what's your feeling on making this page-visible? I've been frustrated in the past digging into a spec detail only to find there was useful information in an html comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine.
fetch.bs
Outdated
<a for=response>header list</a>. | ||
|
||
<li><p>If <var>nosniff</var> is not failure and <var>mimeType</var> (ignoring parameters) is a | ||
<a>CORB-protected MIME type</a> or <code>text/plain</code>, then return <b>blocked</b>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a note or something explaining why we block text/plain
here but not for 206.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. @csreis might have an opinion on what is the right thing to do here - one one hand we want to protect as many sensitive resources as possible, OTOH dropping text/plain protections would avoid extra special-cases in the spec and in the code.
fetch.bs
Outdated
|
||
<li> | ||
<p>If <var>noCorsResponse</var> is not a <a>filtered response</a> and the <a>CORB check</a> | ||
with <var>request</var> and <var>noCorsResponse</var> returns <b>blocked</b>, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd flip this around, and immediately return noCorsResponse
if it isn't filtered or passes the CORB check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.
@annevk - can you please help me understand why you've added "If noCorsResponse is a filtered response" in c4a5a28? Some filtered responses do not filter out the response body (e.g. a basic filtered response), so it seems to me that they should still be subject to CORB. What am I missing? :-)
FWIW, I've removed the filtered-response wording and followed @jakearchibald's suggestion.
Please note that in the current Chromium implementation CORB is applied across the board (possibly on top, or rather before any filtered-response processing, which AFAIK is happening inside Chromium's renderer processes). Please shout if you think that removing the filtered-response wording from the PR was a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing service workers. This can get a response that isn't filtered at all at which point poking at it would result in a null pointer exception or some such (if specifications had those).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - I think you're saying that populating corbSanitizedResponse might fail if noCorsResponse is a filtered response, because in this case noCorsResponse might not contain status / HTTPS state / CSP list. Is that the concern?
I've tried to fix this by adding an extra step that I think should address your concern:
If <var>noCorsResponse</var> is a <a>filtered response</a>, then return
<var>noCorsResponse</var>.
fetch.bs
Outdated
<li><p>Let <var>oldNoCorsResponse</var> be <var>noCorsResponse</var>. | ||
|
||
<li> | ||
<p>Set <var>noCorsResponse</var> to a new <a for=/>response</a> whose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do the immediate return as suggested above, you don't need to do the variable juggling here.
Instead, you can call this something more meaningful, like sanitizedResponse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
@jakearchibald - can you please take another look? @annevk - when replying to @jakearchibald's feedback I've removed one of your changes (skipping CORB for filtered responses) - please take a look at #686 (comment) and double-check that this is okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #686 (comment)
Looks great aside from the issue @annevk pointed out in #686 (comment) |
What I meant to say is that if fetching returns a filtered response, the response will have come from a service worker (and will therefore already have gone through a CORB check). We could perform one again but that seems pointless? And if we did one again presumably we should do it on the internal response, not the wrapper, but I really don't see why. If the response comes from the network or HTTP cache it won't be a filtered response at this point. (I guess it's okay that these responses can still enter the HTTP cache.) |
You're right - ignoring service worker responses is actually required for some scenarios and I agree that ignoring responses from service workers in all scenarios is still okay (e.g. CORB only cares about blocking the responses from the network).
Thanks for highlighting/confirming that. This makes me much more comfortable that skipping filtered responses is okay.
It's not only okay, but it is required to retain some of performance benefits of link prefetching (although I am not sure how often in practice one encounters a scenario where 1) a HTML/JSON/XML resource from bar.com is first prefetched in a foo.com's frame and 2) later bar.com frame accesses the same resource in a performance-sensitive path). Note that so far we've failed to create a WPT test that verifies this scenario. |
@jakearchibald - is there any other action or change required here (i.e. after 5884f85 above)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some further editorial nits that I'll address tomorrow most likely.
fetch.bs
Outdated
<b>allowed</b>, then return <var>noCorsResponse</var>. | ||
|
||
<li><p>If <var>noCorsResponse</var> is a <a>filtered response</a>, then return | ||
<var>noCorsResponse</var>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to move one step up I think (and could therefore be combined again as I did it).
@anforowicz agree with Anne in terms of combining those lines, otherwise LGTM. |
…ple things and editorialize
I pushed a new commit that addresses my own comment, older feedback from @jakearchibald about making the download issue visible, rewords a couple things, and fixes source formatting issues. |
Thanks @anforowicz and @jakearchibald! Looking forward to working out the various enhancements. |
Thanks for the reviews and your patience! |
Motivation for the changes
Allowing blocking of HTML/JSON/XML nosniff responses going to image (or audio/video/font/track) destinations covers most of scenarios where CORB is actually observable in the wild today.
Explaining why blocking via net error is okay
Note that 1) CORB explainer asks to block a protected response by injecting an empty body (see the "How does CORB “block” a response?" section in the explainer) and 2) the Fetch spec asks to block a nosniff responses by injecting a network error (see the "Main fetch" section, item 10).
This difference might matter for some destinations (e.g. XHR and
fetch()
), but shouldn't matter for destinations covered here. For example - HTMLImageElement'onerror
event will fire both in case of a network error and in case of a decode error (e.g. one triggered by the empty body injected by CORB).One potential difference is the timing of when the network error VS decode error is observed. In practice, this timing difference shouldn't matter though - CORB can be seen as a very slow network proxy that postpones the network error until the image decoding actually happens.
Preview | Diff