-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
baseline
should be default
and default
should be baseline
#183
Comments
When I created the original sanitizer in Mozilla, it was for the purpose of Thunderbird sanitizing HTML emails. Thunderbird still has that feature and uses it in more and more places, to counter security bugs. The goals are:
Part 3 is surely highly opinionated and should be optional. Parts 1 and 2 are essential to the purpose of the sanitizer in Thunderbird. I would think that most use cases for a sanitizer would be:
I would think that almost all of these use cases want very strong assurances for parts 1 and 2, and some might also want part 3. I therefore think that the default config for the sanitizer should satisfy the requirements in Part 1 and 2 above, and that Part 3 should be optional and very easy to add. This basic config should then be adaptable fine-grained on a per tag/attribute or similar level to allow or deny specific aspects, as discussed in #181 . E.g. if a use case needs to allow MP4 videos in user comments, it would be able to specifically allow that, but still otherwise use the default config. |
Mostly agree. The idea behind The main issue I have with making baseline the nitpick: |
Looking at the API through a security engineer lens, I don't see a good way to offer a The difficulty with providing an implicit opinionated setting that would be a good enough fit for all is that it's likely a quickly moving target, given how HTML evolves over time. The ways of executing JS from markup don't change as often as a way of styling content, fetching resources, or even parsing inline file formats (e.g. That said, I think there is a valid use case for configuring the sanitizer to do something else than preventing XSS, in a convenient way. Application developer should not need to know all 30+ knobs that can disable remote fetches - but I think it's better if this is an explicit decision left to the developer, and not the default, because there security is always in a context of a threat model - so the developers should specify what flavor of risks they want to mitigate. Even for the cases listed here in the thread, I already see that the Thunderbird sanitizer has significant differences to sanitizer that Gmail uses. Webmail is very likely a complex type of application for which the sanitizer rules follow from the product features (e.g. Gmail rewrites image links to transcode the images, hide the viewer IP addresses etc. Both external images and forms in email are a feature many UAs support, sometimes it's guarded by user preferences. Some UAs only support inline styles, some rewrite the CSS to make sure that it can't affect other parts of the UI. Some like static images, but not animations. And so on...). I simply don't think a Web API can ever propose a Webmail-ready sanitizer setting. It's probably much easier to offer a setting that removes forms, anything that would cause a fetch, or only simple markup. To me it feels like the actual use case and popular settings are best to be fleshed out in user code and only then moved to the platform, but if we converge on popular settings that we are confident would be useful in existing applications, that's great too - but it feels like a v2 thing? |
... and developers cannot be expected to constantly keep up with these developments and adapt their code. That's why it's important that these preset profiles are constantly adapted as new features are added to the browser or new security or privacy risks are being discovered. That's the job that the sanitizer API has to do for the developer. This should be the default. Default is secure, in even in the future, in face of newly discovered risks. The developer just says "I don't trust this HTML. So, sanitizer, please remove anything potentially dangerous from it. I don't care whether it still looks the same as the original. I strongly prefer security over faithfulness of original." So, yes, the output of the sanitizer will and has to change over time, as new risks are discovered. If a dev wants a very specific set of filtering, and does not want it to change over time, then he should opt for a very low bar (much less secure) as baseline and then use the specific allow/deny feature discussed in #181 to select a specific set of features to filter. That would then stay the same over time. It would expose risks over time, and the dev would be responsible to keep up. But given that very few devs can do that, this should not be the default. In other words: By default, provide a secure sanitization, which can also evolve over time to stay secure as new risks appear. Optionally, for those expert devs who want fine control and are accepting the security risk, they can customize it as they want.
That's what the customization feature should allow. It should start with a good secure sanitizer setting which most closely matches the needs, then add overrides to allow or deny specific features. |
I don't think we can define what a 'secure sanitizer' is, without enumerating the specific risks we mean to protect against. There is a wide understanding that JS execution from markup should be stopped, but there is no similar consensus about other content, for example:
to just name the few. You argue to default to a strict sanitization for 'anything potentially dangerous', but this needs agreeing on what dangerous means exactly. Case in point: every HTML sanitizer I've seen so far has different defaults even for the above settings. If the whole web community for years so far didn't come up with a convention of what a 'secure HTML subset' is, why do you think that the solution devised by a few on this thread would be the best one?
A sanitizer that just returns
In my experience working with the app developers, the reverse is true. The business priority is that the outputs look like the input, we only need to make sure that the malicious input is filtered out. Users are really not happy when the copy-paste doesn't work reliably across web applications with WYSIWYG editors, for example. |
That's a great discussion so far. Thanks to all involved. I think @koto's latest comments summarizes my main gripe here. We all know there is a shared understanding of the threat model "no XSS". Everything else seems to be quite app-specific with a lot of "grey-area". I'll stick with my recommendation to remove the distinction between baseline and default and focus on XSS first and foremost. At a later stage (and not here!), we may add static properties to the constructor like Any objections? |
I've discussed this with a few people on our end, so we can make some progress: I think we're re-discussing the (security) requirements here. If we focus the Sanitizer strictly on "XSS first and foremost", and arguably on being a building block (vs an end-user focused, more opinionated API), then I think this proposal makes sense. I'm happy to go along with it, provided we're consistent about it and adopt this as the official threat model & requirement for all of the Sanitizer API. If we narrowly focus on XSS (plus additional, user-configurable options), then indeed the distinction between baseline and default is moot and drops out. But also, the presently spec-ed Sanitizer configuration has a number of additional options, none of which have a narrow XSS aspect to it: allowComments, allowUnknownMarkup, allowCustomElements. All of these presently default to Similarly, allowShadowRoots is being discussed elsewhere. The same logic would apply there, I guess, and it'd be default allowed? I'll also note that this seems to be in direct contradiction to #185, which aims to make the underlying security model opt-out-able. So, as said: I'm happy to agree on strictly "no XSS" as the Sanitizer threat model. But it'd have consequences beyond just dropping the baseline-vs-default distinction. |
My two cents as a web developer - I'm using the Sanitizer API currently to ensure that HTML strings returned from the server contain exactly the HTML elements I expect and no more. Because I know the full list of attributes and elements that I should ever see, I can use a whitelist instead of a blacklist - meaning I don't really run into any of the problems described above. It's hardly perfect - somebody determined to screw with my markup still can, it's just that all they can do is add extra rows to a table and such, rather than load scripts and images. While I hardly expect it to be built into the API itself, I imagine that a "copy paste friendly" whitelist for e.g. expected markup from markdown would cover a lot of use cases. |
We just had a synchronous chat with Sanitizer contributors and decided that baseline is default and default is baseline. If people want and need more, stricter protections, there will be custom Sanitizer allow lists on top of this default. Specifically, we're hoping to look into having some provided by the API as mentioned in #82, but this will have to wait until the foundation is solid enough to build upon. Hopefully, soon! Next step: Rewrite the spec such that baseline === default. |
baseline
versus default
baseline
should be default
and default
should be baseline
What I actually said at the meeting is that we should agree on whether we stick to a strict threat model -- "XSS first and foremost" -- or not; and that the baseline decision falls out of that; and that I'm quite fine with a strict "XSS first and foremost", which would indeed mean the baseline/default distinction drops out. I'm happy to follow group consensus on whether we sharpen the thread model to "XSS first and foremost" or not, but doing so would seem to require some additional cleanup. I'll open a separate issue to make sure this gets answered. |
Most other sanitizers have a default list that is according to some standard (an opinionated default, a default that cares about XSS and nothing else, ...). Most other sanitizers also allow configurations in which developers may re-allow dangerous and XSS-y markup.
We aim for the Sanitizer API to not allow configurations that might reintroduce XSS. For that reason we've had a couple of "list categories" namely. We have one of each category for attributes and elements. The categories are as follows:
allowUnknownElements: true
. The intent is to allow configurations that include afoobar
element which is obviously non-existant but also harmless. We check every unknown element to be actually unknown, asallowUnknownElements: true
should not be used to re-introduce XSSy elements likescript
.Lately, we've been discussing what the
default
is actually defending against and we could not really come up with a hard line.My suggestion is to therefore remove the distinction between baseline and default.
We should, however introduce threat-model specific configurations that are less "opinionated" and orient along hard facts. E.g., a configuration that disallows form hijacking & DOM Clobbering on top of the baseline.
In a background conversation, @otherdaniel was in favor of this but wanted to think about it some more before moving on.
We're curious to hear other opinions, but I hope not to dwell on this for very long.
The text was updated successfully, but these errors were encountered: