Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot sandbox at start of navigation. #5098

Merged
merged 3 commits into from
Nov 25, 2019

Conversation

clelland
Copy link
Contributor

@clelland clelland commented Nov 18, 2019

This change fixes a race condition where an iframe's sandboxing flag set could
be changed in between the start of a navigation and when the response is
returned, and the new document created. In that case, it was unclear how
the new document could reliably synchronously get the updated flags, or
just exactly how late those flags could be changed and still impact the
new document. Now, the sandboxing flag set is routed from the beginning of the
navigation to the eventual document creation.

Ref: #4783, and also see
w3c/webappsec-permissions-policy#256

(See WHATWG Working Mode: Changes for more details.)


/browsers.html ( diff )
/browsing-the-web.html ( diff )
/origin.html ( diff )

This change fixes a race condition where an iframe's sandboxing flag set could
be changed in between the start of a navigation and when the response is
returned, and the new document created. In that case, it was unclear how
the new document could reliably synchronously get the updated flags, or
just exactly how late those flags could be changed and still impact the
new document. Now, the sandboxing flag set is routed from the beginning of the
navigation to the eventual document creation.

Ref: whatwg#4783, and also see
w3c/webappsec-permissions-policy#256
@clelland
Copy link
Contributor Author

@annevk, @bzbarsky, @domenic FYI -- does this make sense? It's come up a few times, and we're attempting to fix this race in Chrome this way. If this works, I'd want to extend this to other kinds of policies as well.

@bzbarsky
Copy link
Contributor

This makes sense to me in general; I don't see a good way to sanely define this otherwise.

I do have one compat concern here: browsers are currently not entirely aligned with each other, or with the spec, in terms of exactly when "start of navigation" is. That means that they can end up picking up different sandboxing flags if people do things that change the sandbox flags right around when navigation might be starting.

@natechapin has been doing some awesome work aimed at reconciling those implementation/spec differences in navigation start behavior, but I'm not sure what the status of that is.... It might be worth getting implementation and spec bugs filed on any remaining issues there and maybe addressing them in parallel with this.

I don't think we need to block this change on that work, just keep it in mind.

@clelland
Copy link
Contributor Author

@natechapin, is that work tracked anywhere?

Also related -- https://wpt.fyi/results/html/browsers/sandboxing/sandbox-new-execution-context.html is broken by this change, but it looks like it was relying on pretty subtle timing behaviour, and doesn't work anywhere but in Blink. (In FF, it breaks without devtools: the iframe's contentDocument is null after appendChild and before the script completes. Safari is doing something I don't understand yet.)

@natechapin
Copy link

@natechapin has been doing some awesome work aimed at reconciling those implementation/spec differences in navigation start behavior, but I'm not sure what the status of that is.... It might be worth getting implementation and spec bugs filed on any remaining issues there and maybe addressing them in parallel with this.

I don't think we need to block this change on that work, just keep it in mind.

Chrome matches Firefox and the spec in most cases, but there are still some open questions around form submission and following hyperlinks.

  • Form submission has an queue a task step in the plan to navigate algorithm. I removed that, and it's caused some compat issues around double submission of forms. We're looking at reintroducing that async step to plan to navigate.

  • Chrome has never queued a task for following hyperlinks. As far as I can tell, there's no compat issue with this except that there may be some ordering assumptions between link clicks and form submissions that might make it difficult to queue a task in the form plan to navigate algorithm while also starting hyperlink navigation synchronously. @josepharhar might be able to better described what he's run in to there.

@josepharhar
Copy link
Contributor

I'm working on changing form submission in chrome to be async again by implementing plan-to-navigate, but so far I seem to be breaking a lot of tests. I'm not sure when I'll figure it out.

As for what I've run into, in the case of an tag with an onclick handler that submits a form manually, the click navigation is supposed to take precedence over the form submission, which is challenging when the click starts a navigation synchronously and then gets later superseded by the async form submission navigation.

@bzbarsky
Copy link
Contributor

@clelland It looks like Firefox already makes the "should this get an opaque origin?" decision based on the sandbox flags in effect when the navigation started, not the ones in effect at document creation time. That's why Firefox is failing that new test at https://wpt.fyi/results/html/browsers/sandboxing/sandbox-new-execution-context.html. Other sandbox flags, inconsistently, are in fact picked up at document creation time.

On the bright side, the fact that Firefox already has this behavior for this flag and it hasn't caused obvious compat issues is a really good sign!

@natechapin It's worth filing a separate issue on the form submission bits so we can sort out why the double-submission problems are not a problem for Firefox, with its sync form submission...

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, and nice job threading everything through. I think there is a potential cleanup, which is that "determining active sandboxing flags" no longer needs its second argument to exist, right?

@clelland
Copy link
Contributor Author

@domenic, good catch -- yes, since the only caller that used that parameter has been rewritten, I can take that out now.

@clelland
Copy link
Contributor Author

Does "determine active sandbox flags" still make sense as a name for that algorithm, or should we look for a better name, now that it's mostly used to get the flags for a browsing context before it is actually populated with a document?

@domenic
Copy link
Member

domenic commented Nov 20, 2019

Yeah, I'm not really sure what "active" was supposed to connote, but it probably is less applicable now. So maybe just "determine sandboxing flags"?

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. It sounds like at least Blink and Gecko are on-board, so the only thing we're waiting for is tests.

I assume this will be tricky to test since it's about a race condition, but it sounds like you do have a plan in that regard?

@domenic domenic added needs tests Moving the issue forward requires someone to write tests security/privacy There are security or privacy implications topic: frames/navigables/browsing contexts topic: navigation labels Nov 21, 2019
@clelland
Copy link
Contributor Author

In Chromium we can test this directly; on WPT it will be a bit more difficult. There are a couple of things that we can do by manipulating network timing: we can test that a sandbox attribute change after navigation starts doesn't do anything, and we can test that if a second navigation is triggered before the first one returns, that any attribute changes in the interim are effective. If we do that with cross-origin network requests, same-origin network requests, and data/srcdoc documents, we should hopefully trigger any in-process vs out-of-process code paths.

After that, if a race is still present, then it's likely to show up as a flaky test. Again, Chrome has infrastructure in place to detect those flakes, but I don't know what other implementers do. I don't think wpt.fyi would be sufficient in that case.

@bzbarsky
Copy link
Contributor

we can test that a sandbox attribute change after navigation starts doesn't do anything

Yes, that should be pretty simple to test deterministically.

we can test that if a second navigation is triggered before the first one returns, that any attribute changes in the interim are effective

Why is "before the first one returns" relevant? That is, it seems like we can do two tests:

  1. Start navigation, then set sandbox attr. Wait for navigation to complete, check that sandbox attr was not picked up.

  2. Start navigation, set sandbox attr, start second navigation. Wait for navigation to complete (should only be the second one, because the second start cancels the first navigation per spec), check that sandbox attr was picked up.

that's all assuming the navigation is started from the same script as sets the sandbox attr. Obviously if we're in a multiple-process case with different origins and the parent sets the sandbox attr "while" the child is navigating itself, that's all racy. It can be somewhat serialized via postMessage, but the best we can do there is "set attr, then postMessage to start the navigation", I think. If we postMessage first, that could run before we set the attr, right?

As far as the "Apparently unnecessary, but will commit more WPT to ensure" bit up-thread, that really depends on what sandbox flags get tested. As I said above, Firefox effectively grabs the "allow-same-origin" flag when navigation starts an the rest when the document is created right now, at least if I read the code right.

@clelland
Copy link
Contributor Author

Why is "before the first one returns" relevant? That is, it seems like we can do two tests:

  1. Start navigation, then set sandbox attr. Wait for navigation to complete, check that sandbox attr was not picked up.
  2. Start navigation, set sandbox attr, start second navigation. Wait for navigation to complete (should only be the second one, because the second start cancels the first navigation per spec), check that sandbox attr was picked up.

That's basically what I was thinking: to check the case where the first navigation was cancelled, to ensure that the initial flags are not re-used. (I want to test the cancellation case specifically, not just a sequence of two navigations.)

There may be another edge case where the first response triggers a redirect to another resource, where we need to ensure that the initial flags are used, even if they were changed in the meantime.

that's all assuming the navigation is started from the same script as sets the sandbox attr. Obviously if we're in a multiple-process case with different origins and the parent sets the sandbox attr "while" the child is navigating itself, that's all racy. It can be somewhat serialized via postMessage, but the best we can do there is "set attr, then postMessage to start the navigation", I think. If we postMessage first, that could run before we set the attr, right?

That might be fundamentally racy :( I suppose in theory that the postMessage could be delivered to the child frame, and handled by the onmessage handler there, before the next line of code runs in the parent frame. (It looks like step 8 of the post message steps runs synchronously in the sender, and the queued task could be picked up immediately in the receiver process)

(I imagine we could resolve that race by forcing postmessage to wait until the current script finishes before actually sending the message -- make step 8 say "queue a task to queue a task on the posted message task source... -- but that would probably have other unintended effects.)

As far as the "Apparently unnecessary, but will commit more WPT to ensure" bit up-thread, that really depends on what sandbox flags get tested. As I said above, Firefox effectively grabs the "allow-same-origin" flag when navigation starts an the rest when the document is created right now, at least if I read the code right.

Do you think there are other flags which are handled differently? Should there be a bug filed on bugzilla to ensure that?

@bzbarsky
Copy link
Contributor

Do you think there are other flags which are handled differently?

I think the "allow-same-origin" flag is the only "weird" one at the moment in Firefox.

@clelland
Copy link
Contributor Author

clelland commented Nov 22, 2019

Added tests in a WPT PR; updated top comment with the link.

@domenic domenic removed the needs tests Moving the issue forward requires someone to write tests label Nov 25, 2019
@domenic
Copy link
Member

domenic commented Nov 25, 2019

I'll go ahead and merge this, as everyone seems to be on board and there are tests in web-platform-tests/wpt#20411. I'll trust @clelland to get those reviewed and, if the results show failures on Gecko and WebKit, file bugs and link to them here.

@domenic domenic merged commit eb13fec into whatwg:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants