Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback base URL computation for a document with about:srcdoc but not an iframe srcdoc document #3989

Closed
tkent-google opened this issue Sep 4, 2018 · 21 comments · Fixed by #9464

Comments

@tkent-google
Copy link
Contributor

tkent-google commented Sep 4, 2018

Demo: http://jsfiddle.net/int32_t/oztck57e/5/
Bug report from a web author: https://crbug.com/875642

Specification link 1: https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#document-open-steps Step 11.3
Specification link 2: https://html.spec.whatwg.org/multipage/urls-and-fetching.html#fallback-base-url

The demo page has three nested documents,

  • Top
    • iframe srcdoc doc1
      • iframe about:blank doc2

According to the specification, doc2's URL will be about:srcdoc by doc2.open(). See specification link 1. After that, fallback base URL computation for doc2 returns about:srcdoc because the document is neither an iframe srcdoc document nor about:blank. See specification link 2.

Chrome currently follows the specification. So relative URLs in doc2 can't be resolved.
Firefox throws an exception on doc2.open(). So doc2's URL is not changed.
Safari's behavior is not understandable. But anyway the base URL of doc2 is the URL of the top document.

Proposal:
I'd like to change the specification so that doc2 has the top document URL as the fallback base URL in this case. For example, changing

  1. If document's URL is about:blank,

to

  1. if document's URL is about:blank or about:srcdoc,

in https://html.spec.whatwg.org/multipage/urls-and-fetching.html#fallback-base-url

@tkent-google tkent-google changed the title Fallback base URL computation for a document without about:srcdoc but not an iframe srcdoc document Fallback base URL computation for a document with about:srcdoc but not an iframe srcdoc document Sep 4, 2018
@annevk
Copy link
Member

annevk commented Sep 5, 2018

cc @TimothyGu @bzbarsky

@bzbarsky
Copy link
Contributor

bzbarsky commented Sep 5, 2018

It really feels like we should not have documents whose URL is about:srcdoc that are not srcdoc documents... It causes problems of all sorts, not just this.

@tkent-google
Copy link
Contributor Author

yeah, tweaking document.open() is also an option.

@domenic
Copy link
Member

domenic commented Sep 6, 2018

Are there other ways to cause documents to have about:srcdoc URLs, besides document.open()? If that's the only one, then probably we should fix it. And now's the time to do so, as we're making other changes/alignments in that area.

@tkent-google
Copy link
Contributor Author

I have no idea about other ways to set about:srcdoc to an existing document.

"URL and history update steps" is used by pushState(), but I think about:src is rejected by security checks in Step 7 of pushState() and replaceState()

@annevk
Copy link
Member

annevk commented Sep 7, 2018

location.reload() is also special cased. I suspect document.open() being able to do this was an oversight.

@bzbarsky
Copy link
Contributor

bzbarsky commented Sep 7, 2018

Are there other ways to cause documents to have about:srcdoc URLs, besides document.open()?

A javascript:'hello' iframe in a srcdoc document? Gecko doesn't implement the spec behavior for using the url of the page the javascript: url comes from for the url of the resulting document so far, so that hasn't bitten us yet.

@domenic
Copy link
Member

domenic commented Sep 7, 2018

I wonder if we should port the

Compare newURL to document's URL. If any component of these two URL records differ other than the path, query, and fragment components, then throw a "SecurityError" DOMException.

check into the URL and history update steps generally. I guess that would make document.open() throw in this case; perhaps we want it to no-op instead.

This would also impact blob: and data: URL-derived documents. (I believe those can be same origin... haven't checked.)

the spec behavior for using the url of the page the javascript: url comes from for the url of the resulting document

I wasn't able to find this in the spec.

@domenic
Copy link
Member

domenic commented Sep 7, 2018

I wonder if we should port the [...] check into the URL and history update steps generally.

On further reflection this seems unlikely to work. There is probably lot of code that does things like

const win = window.open("about:blank");
win.document.open();
win.document.write("<a href='/foo'>");
win.document.close();

and expects the relative link to work. With my "I wonder..." proposal the relative link would instead of fail to resolve relative to about:blank.

So, we are back to either a guard against about:srcdoc in the document open steps, or fixing the fallback URL computation.

@bzbarsky
Copy link
Contributor

bzbarsky commented Sep 7, 2018

I wasn't able to find this in the spec.

Hmm. The spec used to define something here (I can't find old versions well enough on my current network to look up what), but now https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigating-across-documents step 13 javascript: case substep 13 just creates a Response with no obvious URL. And then https://html.spec.whatwg.org/multipage/browsing-the-web.html#process-a-navigate-response does:

Any Document created by these steps must have its URL set to the URL that was originally to be fetched, ignoring any other data that was used to obtain the resource.

which is definitely not what used to be going on there; here used to be an "override URL" and whatnot.

It's possible that that URL was the URL of the Document that the evaluation happens in, not of the one where the URL came from. But the current spec doesn't seem to do either one, afaict...

@semoal
Copy link

semoal commented May 28, 2019

Any solution related to this?

@annevk
Copy link
Member

annevk commented May 28, 2019

@bzbarsky it was removed as part of #3946. I recall pointing that out as a potential problem, but I cannot find me doing so.

To take a step back here, we have various things that affect state of documents and it would be really good to have a consistent story. All the state that is expected to be copied onto about:blank documents should probably also be copied once document.open() is invoked. (Currently the latter is lacking referrer and referrer policy for instance, and maybe CSP.)

And then we have certain special URLs that we generally do not want to leak (e.g., via referrer) that we perhaps should avoid copying over as a somewhat general policy. So that if you have a data: URL document with an <iframe> the base URL of that <iframe> does not become the full data: URL.

At a high-level this would make sense to me, but I don't think I have the bandwidth right now to drive all the necessary changes and tests.

@bzbarsky
Copy link
Contributor

it was removed as part of #3946

Er... why would that commit have changed the behavior of javascript: loads??? If it did, that's a bug in the commit and we should fix it, no?

All the state that is expected to be copied onto about:blank documents should probably also be copied once document.open() is invoked.

Possibly, yes. It's a little hard to say what semantics people expect out of document.open....

@annevk
Copy link
Member

annevk commented Jun 7, 2019

@domenic @TimothyGu do you remember why you removed the "override URL" concept in #3946 despite it also affecting javascript: URLs? I cannot find our prior conversation about it.

@domenic
Copy link
Member

domenic commented Jun 10, 2019

@TimothyGu may recall better, but my best attempt at recollection was that we did a bit of testing and found it was Gecko-only. Don't hold me to that though.

Edit: the linked PR says

This PR also removes the "overridden reload" algorithm and related concepts, which were previously only implemented in Firefox and Edge, causing developer confusion evident in https://bugzilla.mozilla.org/show_bug.cgi?id=556002.

@bzbarsky
Copy link
Contributor

That's talking about the document.open() case. But in the javascript: case, what's the Chrome/Safari behavior? The Gecko behavior is to give the new document a javascript: URL. The spec behavior with override URL was to give it the URL of the document that started the navigation. The current spec behavior is, I think, to give it the URL of the document the javascript: evaluation happened in. What does Chrome implement and why? And what happens with base URLs? In Gecko, the base URL definitely comes from the document that started the navigation, because that was needed for web compat. In Chrome, I haven't quite figured out whether it's using the base URI of the document the evaluation happened in or something else...

@bzbarsky
Copy link
Contributor

(And to be clear, this was a use of override URL that had nothing to do with "overridden reload" and which also got removed!)

@bzbarsky
Copy link
Contributor

bzbarsky commented Jun 10, 2019

The current spec behavior is, I think, to give it the URL of the document the javascript: evaluation happened in.

Er, no, the current spec behavior is unclear to me. https://html.spec.whatwg.org/multipage/browsing-the-web.html#initialise-the-document-object says to use the URL of the request if there is a request, otherwise to use the URL of the response. But in this case https://html.spec.whatwg.org/multipage/browsing-the-web.html#read-html doesn't pass a request or a response to those steps, though we might assume it's at least passing the response handed to <https://html.spec.whatwg.org/multipage/browsing-the-web.html#process-a-navigate- response>. That response has no URL, though, in the javascript: case, and the only request around is the one with the javascript: URL, so if you do the reading between the lines the spec says to use a javascript: URL for the document? That behavior is Gecko-only and the override thing was precisely to do something that matched other browsers more closely...

@TimothyGu
Copy link
Member

I think I understand the concern here. I reviewed my notes and chat logs but I can't find anything that specifically talks about javascript: URLs, so I must've either conflated "overridden reload" with "override URL", or not understood the historical context of

That behavior is Gecko-only and the override thing was precisely to do something that matched other browsers more closely...

A PR to restore the concept of "override URL" solely for the purpose of javascript: URLs would seem correct. If anyone would want to see the original version, it is available at https://whatpr.org/html/3946/c5eda3f/browsing-the-web.html#override-url

@domenic
Copy link
Member

domenic commented Jun 11, 2019

Thanks @TimothyGu. I guess what would be most correct would be to test browsers and figure out what they do, and write extensive tests, but failing that, restoring us to a non-broken spec would be a good idea, yeah.

/cc @natechapin in case he has also looked at this aspect of javascript: URLs.

@bzbarsky
Copy link
Contributor

A PR to restore the concept of "override URL" solely for the purpose of javascript: URLs would seem correct.

I agree. Thank you for the link to the old spec text: that pretty clearly says that the result of a javascript: evaluation gets the URL of the document it was evaluated against, not of the document that started the navigation (as I said above). I think that matches the Chrome behavior, in my limited testing.

TimothyGu added a commit to TimothyGu/html that referenced this issue Jun 14, 2019
This concept was deleted in 6440cca along with "overridden overload",
but it was discovered in whatwg#3989 that correct handling for javascript:
URLs still depend on this concept. The spec around this area has changed
quite a bit since then due to intermediate cleanups such as e13762f,
so this is not a simple revert.

Also, fix ungrammatical "append URL URL list" that was accidentally
introduced in e13762f.

Addresses some concern raised in whatwg#3989.
TimothyGu added a commit to TimothyGu/html that referenced this issue Jul 1, 2019
This concept was deleted in 6440cca along with "overridden overload",
but it was discovered in whatwg#3989 that correct handling for javascript:
URLs still depend on this concept. The spec around this area has changed
quite a bit since then due to intermediate cleanups such as e13762f,
so this is not a simple revert.

Also, fix ungrammatical "append URL URL list" that was accidentally
introduced in e13762f.

Addresses some concern raised in whatwg#3989.
TimothyGu added a commit to TimothyGu/html that referenced this issue Jul 1, 2019
A series of changes starting with 6440cca which removed the override
URL concept as well as intermediate cleanups such as e13762f eroded
the correctness of javascript: document URLs. Now it seems to be the
case that fixing this typo restores the correct handling for such URLs.

Addresses some concern raised in whatwg#3989.
domenic pushed a commit that referenced this issue Jul 5, 2019
A series of changes starting with 6440cca which removed the override
URL concept as well as intermediate cleanups such as e13762f eroded
the correctness of javascript: document URLs. Now it seems to be the
case that fixing this typo restores the correct handling for such URLs.

Addresses some concerns raised in #3989.
domenic added a commit that referenced this issue Oct 31, 2022
This monster completely rewrites everything to do with navigation and traversal.

It introduces the "navigable" and "traversable navigable" concepts, which take on many of the roles that browsing contexts previously did, but better. A navigable can present a sequence of browsing contexts, which to the user seem to all be the same, but due to browsing context group switches, have different WindowProxys and are allocated in different agent clusters. A traversable navigable manages the session history for itself and all its descendant navigables, providing a synchronization point and source of truth.

The general flow of navigation and traversal is now geared toward creating a session history entry, populated with the appropriate document, before finally applying the history "step". The step concept for session history, managed by the traversable, replaces the previous idea of joint session history, which was a sort of deduplicated union of individual session histories for each browsing context within a top-level browsing context.

Notable things we won't tackle this round, but are much easier to tackle in the future:

- Iframe restoration on (non-bfcache) history traversal is not yet specified.
- Overlapping navigations and traversals (see #6927) are not perfect yet, although this makes them better.
- Browsing context names (see #313) are not perfect yet, although this makes them better.
- Base URL inheritance and storage in session history (see #421, #2883, and #3989) is not yet specified.
- Sandbox flag storage in session history (see #6809) is not yet specified.
- Task queuing when creating agents/realms/windows/documents (see #8443) remains sketchy.
- Window object reuse is not yet rationalized (see #3267).

Closes #854 by clarifying the javascript: URL origin and origin-checking setup.

Closes #1073 by properly resetting active-ness of documents when they are removed.

Closes #1130 by removing the source browsing context concept, using a sourceDocument argument instead, and taking source snapshot params at the appropriate early time.

Closes #1191 by properly sharing document state across documents, as well as overlapping same-document navigations plus cross-document traversals.

Closes #1336 by properly handling child browsing contexts.

Closes #1382 by only unloading after we are sure we have a new document (i.e., not a 204 or download).

Closes #1454 by rewriting session history closer to what implementations do, with the nested history concept in particular taking care of the issues discussed there.

Closes #1524 by introducing the POST data concept and storing it in the document state.

Closes #2436 by rewriting the spec for history.go() to be clear about the results. Tests: web-platform-tests/wpt#36366.

Closes #2566 by introducing an explicit "history object" definition. Tests: web-platform-tests/wpt#36367.

Closes #2649 through clear creation of srcdoc documents, including during history traversal.

Closes #3215 by preserving POST data and reusing it on reloads.

Closes #3447 by specifying a precise mechanism (the ongoing navigation) for canceling navigations, and the points at which that mechanism is consulted. It also stops queuing a task for hyperlink navigations.

Closes #3497 by posting appropriate tasks for cross-event-loop navigations.

Closes #3615 by rewriting traverse a history by a delta, which eventually calls into apply the history step, to navigate all relevant navigables.

Closes #3625 by storing information in the document state (not just the URL), so that future traversals can reconstruct the request appropriately.

Closes #3730 by doing proper task queuing for navigation, including one for javascript: URLs but not including one for normal same-frame navigations. Tests: web-platform-tests/wpt#36358.

Closes #3734 by rewriting the definition of script-closable to use well-defined concepts.

Closes #3812 by removing all uses of "active document" as a predicate instead of a property.

Closes #4054 by introducing the session history traversal queue and renaming the previous "history traversal task source" to "navigation and traversal task source".

Closes #4121 by doing the "allowed to navigate" check at the top of apply the history step.

Closes #4428 by keeping a strong reference from documents (including bfcached documents) to their containing browsing context.

Closes #4782 by introducing the top-level traversable and navigable concepts.

Closes #4838 by doing sandbox checking in a much more precise manner, in particular snapshotting the relevant flags early in any traversals.

Closes #4852 by using document state (in particular history policy container, request referrer, and request referrer policy) in reloads.

Closes #5103 by properly restoring scroll positions for everything that is traversed, as part of properly traversing more than one navigable.

Closes #5350 by properly restoring window names across browsing context group switches, and going back to the same browsing context as was previously there when traversing back across a BCG switch boundary. (Implementations could create new browsing contexts, as long as they restore the WindowProxy scripting relationships and other browsing context features; the result is observably equivalent.)

Closes #5597 by rewriting "allowed to download" to just take booleans, derived from the appropriate snapshotted or computed sandboxing flags.

Closes #5767, modulo bugs and oversights we made, by rewriting everything :).

Closes #5877 by re-specifying "fully active" in terms of navigables, instead of browsing contexts.

Closes #6446 by properly firing beforeunload to all descendant navigables, although whether or not they actually prompt still allows implementation leeway.

Closes #6483 by introducing the distinction between current session history entry and active session history entry.

Closes #6514 by settling on using a single origin for these checks.

Closes #6628 by storing window.name values in the document state, so even in strange splitting situations like described there, they remain.

Closes #6652 by no longer changing history.state when reactivating a document from bfcache ("restore the history object state" is called only when documentsEntryChanged is true). Tests: web-platform-tests/wpt#36368.

Closes #6773 by having careful handling of synchronous navigations during traversals. Test updates: web-platform-tests/wpt#36364.

Closes #6798 by treating javascript: URL navigations as replacements.

Works towards #6809 by storing srcdoc resources in the document state.

Closes #6813 by storing referrer in the document state. Tests for the repopulation case: web-platform-tests/wpt#36352. (No tests yet for the reload case.)

Closes #6947 by rolling its contents into this change: PDF documents are put in the same category as other inaccessible, no-DOM documents.

Closes #7107 by clearing history state on redirects and when origin changes by other means, such as CSP.

Closes #7441 by making window.blur() a no-op because that was simpler than updating it to operate on navigables.

Closes #7722 by incorporating its contents into the rewritten version.

Closes #8295 by refactoring the iframe/frame load event specs to avoid the bug.

Helps with #8395 by at least ensuring the javascript: case does not fire beforeunload. Tests: web-platform-tests/wpt#36488. (The other cases remain open for investigation and testing.)

Closes #8449 by exporting "create a fresh top-level traversable" which is designed for the use case in question.

Co-authored-by: Domenic Denicola <d@domenic.me>
Co-authored-by: Dominic Farolino <domfarolino@gmail.com>
domenic pushed a commit that referenced this issue Jul 13, 2023
This gives an "about base URL" member to Document, document state, and navigation params. The intention is to capture a Document's creator's base URL when creating a new browsing context, and preserve it to (1) the newly created Document itself, and (2) the newly-created document state. Notably, preserving it in document state means that the same base URL is used when we recreate the Document while traversing the session history.

For the navigation case, we capture the initiator's base URL in the navigate algorithm as initiatorBaseURLSnapshot (alongside initiatorOriginSnapshot). This eventually threads through, via the document state and navigation params, to the point where we initialize a new Document object.

Finally, we remove the concept of a browsing context's creator base URL algorithm, and update the fallback base URL algorithm accordingly to refer to the relevant Document's new "about base URL" member.

This is all rather different from how the previous specification works. Previously, behavior differed between about:srcdoc and about:blank; base URL changes were supposed to be inherited in a live, not snapshotted, fashion; sometimes the navigation initiator was used and sometimes the browsing context creator/embedder; and the spec "crashed" for disconnected srcdoc iframes.

Closes #421. Closes #2883. Closes #3989.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants