Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting guards at primitives instead of sinks #176

Closed
annevk opened this issue May 28, 2019 · 11 comments
Closed

Putting guards at primitives instead of sinks #176

annevk opened this issue May 28, 2019 · 11 comments
Labels
Milestone

Comments

@annevk
Copy link
Member

annevk commented May 28, 2019

I discussed this briefly with @koto and thought I'd file it to not lose track of the idea. What if instead of sinks we add the guards at the various primitives. E.g., not <a>, <form>, etc. but "navigate". Not fetch(), <img>, etc. but "fetch". Not appendChild() et al but "prepare a script" (or some such). Not innerHTML and friends but "HTML fragment parsing algorithm".

This would put the actual protections right at the dangerous points. We'd still have to change sinks to allow for typed objects to reach the dangerous points, but there's no longer the issue of overlooking a sink or overlooking trusted types when adding a new sink. Or the issue of it not being clear how to update all the various sinks as with Location as we could opt not to add trusted types for all of them.

@koto
Copy link
Member

koto commented May 28, 2019

That's a really good idea! It's mostly just an exercise in finding the right primitives matching the current sinks. For example, for now fetch() is not guarded. I assume the default policy (or a callback function) would run not at a sink setter level, but at the lower-lever primitive (like a spec algorithm).

Two potential downsides I see is that a) it might make the integration with other spec more verbose, so more challenging and b) maybe not everything is polyfillable.

I'll take a look at where we could hook the current TT sink setters, and whether that causes some change of behavior (it will for individual Location properties, so for <a href> as well, but that's actually what we want - #64).

@koto
Copy link
Member

koto commented Jun 1, 2019

One problem with the Fetch integration is that this would happen too late. To avoid having to taint track nodes, TT are guarded at node attribute setters, even if the node is not connected to the document.

For example, for scripts fetch would only be triggered when the script is prepared, but we'd rather require the type earlier, in this case when setting the src.

We could definitely benefit from Fetch integration e.g. by referring to (script-like Request destination)[https://fetch.spec.whatwg.org/#request-destination-script-like] and requiring the URL attributes being TrustedURL or TrustedScriptURL based on whether the Fetch destination later on would be script-like or not. However this seems hard to specify normatively ("require this type if it's possible for a different algorithm to set this value to Request" is quite vague). Perhaps it's better off adding a non-normative note instead? This already seems like what is happening elsewhere, e.g. "audioworklet" Request destination is not specified in Web Audio, Worklets nor HTML, and its value is only linked to audioWorklet.addModule() in a table in non-normative node in Fetch.

@annevk
Copy link
Member Author

annevk commented Jun 4, 2019

You can certainly put requirements on entry points. The main thing about having the actual guard at the primitive is to avoid missing entry points.

Not sure about the note thing, "audioworklet" not being defined in a normative matter is a bug.

@koto
Copy link
Member

koto commented Jun 4, 2019

That's fine, I was just trying to figure out where the Request destinations are defined and audioworklet was the first thing I tried. The same bug exists for paintworklet that I can't find a reference of outside of Fetch, but I assume other values are better in that regard.

I still see some difficulty for speccing this, but it's likely I just misunderstood what you mean. Do you mean to add additional guards in e.g. fetch (the most likely algorithm that would relate to URL TTs), such that it rejects non-compliant values?

E.g.

If request initiator is "non-parser-inserted" , and request window's documents' TrustedTypeConfiguration's string at sink disposition is "reject" and a request URL is not a getTypeForDestination(request destination), reject with some error.

That would require preserving the type of the URL in Fetch as well. So far we're not doing this yet in the JS fetch() call, because we looked into non-script URLs mostly to address a problem with javascript: or other powerful schemes in URLs, and not to prevent requests from happening.

If we'd want to keep JS fetch(string) to work regardless of TT enforcement, we need a way to track the requests initiated by DOM / other sinks from ones using XHR / Fetch - should we use initiator for that?

@annevk
Copy link
Member Author

annevk commented Jun 12, 2019

Yeah, worklets are not defined well and have quite a few open issues against them... Perhaps you can ping some colleagues?

Initiator might be useful if there's nothing else that gives it away. Note also that at a specification level URL is not a string, e.g., to make blob URLs work. So there's infrastructure already to attach additional data to URLs.

@koto
Copy link
Member

koto commented Aug 7, 2019

I was able to tackle the javascript: URLs in a more developer-friendly matter.

Previously, all navigational sinks required a TrustedURL. The only issue with those sinks was that they could potentially navigate to javascript:, but most usages of them were benign. We found that it's unneccessarily difficult for the authors to cover these sinks with TrustedURLs (this is by far the most common violation source), so instead I propose a simple hook that will make sure that a navigation to a javascript: URL will go through a default policy:

https://github.com/WICG/trusted-types/pull/204/files

This places the burden on the authors who still use javascript: URLs, and makes sure that, in "TT mode" the javascript: URLs don't work by default.

koto added a commit to koto/trusted-types that referenced this issue Sep 3, 2019
@koto
Copy link
Member

koto commented Sep 3, 2019

@annevk I've found a couple of algorithms that form the set of primitives that are a good target for guards. The issue is that as these primitives accept strings, or byte streams and are called a few levels deeper than the TT logic, and I don't know how to express in the spec that these should take TT into account, without clumsily passing an additional argumentall the way through the stack.

I see something similar when HTML fragment parsing algorithms is called out as a special case e.g. here

Would something like the following work for TrustedHTML for example? If this is acceptable, I can extend that approach to other types and primitive algos.

[Pull request] [Spec preview]

koto added a commit that referenced this issue Sep 4, 2019
…ascript: URLs. (#204)

This removes the burden from all authors to create types when interacting with common sinks that usually don't cause DOM XSS (unless for javascript: URLs).

This PR prevents javascript: URLs from working by default, and allows programmatic opt-in to enable them one-by-one for the few applications that need them.

Related to #176.
Partially addresses #169.
Fixes #64.
@koto
Copy link
Member

koto commented Sep 30, 2019

There's three generic approaches we could take to implement checks at the primitive algorithms:

  1. Preserve the typed object in a given sink (e.g. attribute node, text node), and perform checks (+ stringify) in a primitive algorithm. We would still prefer that the default policy and the rejection / notification happened early, as this is the developer experience that we want (e.g. throw on script.setAttribute('src', ...), not on appendChild(script)). The primitive algorithms would perform the checks defined in TT spec via relevant global object.

    This requires modifying some of the core IDL interfaces, e.g. Attr or Text.

  2. Add a flag (e.g. was_trusted_html) to interfaces used in a given sink and check that it's present in the lower-level algorithms.

    This also requires modifying some core IDL interfaces, to store a flag.

  3. Pass the flag through the algorithms, without modifying the IDL interfaces. This can likely only be done formally iff a given primitive algorithm is called by the sink setter algo, and not if the former is caller at a later point in time (e.g. for prepare script). That said, maybe this can be explained in the algorithm prose.

I've identified the following algorithms that would be affected. It seems that, after fixing #47, the TT behavior for the currently exposed APIs would be the same with and without the hardening fixes outlined below.

HTML parsing

  • Create an element for the token is called after parsing; since the input data was trusted implicitly, this algorithm needs to add trust info to created elements.

  • HTML fragment parsing. It would have to callout to a relevant global object to make a check.

  • HTML parser (used by document.write). Seems like this needs to be guarded at document.write level only, as it accepts a stream of code points, and not a string.

Attributes

Scripts

@koto
Copy link
Member

koto commented Oct 11, 2019

Moved some of this upstream to whatwg/html#3052 and whatwg/dom#789.

@domenic
Copy link

domenic commented Oct 15, 2019

I just want to write up a concrete analysis of what I think the spec text would look like for the approaches @koto mentions.

There's three generic approaches we could take to implement checks at the primitive algorithms:

  1. This requires updating all sinks' IDL, since they need to accept new types anyway, and then updating all the intermediate algorithms that process the accepted values, all the way down to the primitives, where eventually the primitive does conversion to a string, which causes enforcement to trigger.

  2. If I understand it, this seems like (3) but with extra public API, which is unnecessary. Let's skip it.

  3. This requires updating all sinks' IDL, as well as updating the sink algorithms to turn a string in to a (string, trusted = false) tuple, while turning a trusted type into a (string, trusted = true) tuple. Then we thread those tuples through all intermediate algorithms, all the way down to primitives, where the primitive does enforcement. So, it is basically the same as (1), but feels a bit cleaner since we are not passing IDL types as far down into the stack (e.g. potentially across threads).

Other shared subtleties:

  • Where to throw the enforcement error, once the value reaches the primitives and fails to pass, is not obvious. You'd probably need to store the global object as part of the tuple. What about in async cases---is it too late to throw? I'm not sure if there are currently any situations where the sink is separated from the primitive by an async boundary, but it's easy to imagine some, especially with URLs.

  • If intermediate algorithms need to treat the value as a string (or URL, or...) then they will need to pierce the trusted type or tuple. This can be a bit tricky, and will likely mean many of them need to be updated too.

koto added a commit to koto/webidl that referenced this issue Feb 10, 2020
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
@koto
Copy link
Member

koto commented Mar 2, 2020

The 'putting guards at primitives' approach has surfaced issues with the default policy (#248). It looks like it's preferable to address the TT checks, with the default policy application at IDL conversion time (in one of the few variants outlined in whatwg/webidl#841) - so even "before" sinks. The advantages we get for avoiding the default policy problems seem to outweigh the complexity of guarding at all affected primitives.

There are exceptions to that, more in line with the "at primitives" approach:

Please let us know if that sounds like a bad call, @annevk.

@koto koto closed this as completed Mar 6, 2020
koto added a commit to koto/webidl that referenced this issue Jan 26, 2024
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
lukewarlow pushed a commit to lukewarlow/webidl that referenced this issue Mar 11, 2024
(DOM|USV)String.

This is to hook up the Trusted Types validation during the ES->IDL type
conversion to avoid funky issues with its default policy.

See w3c/trusted-types#248,
w3c/trusted-types#176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants