-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make <script charset> non-conforming #3006
Conversation
I support this generally but would like more input from people like @zcorpan and @sideshowbarker. Especially if @sideshowbarker were able to get us stats on how often this appears, if they were low, that would make me feel better. It's also worth considering that by disallowing this you are disallowing people using |
OK, I’ll add add a use counter for it to the HTML checker later today |
@sideshowbarker you want to measure |
link charset was defined as doing something (but not made conforming) for stylesheets in 29e0e33 script charset was defined as doing something and made conforming in e74a98d |
In the httparchive data
49497 matches, of which 45180 the element is |
I notice now from looking at view-source:http://www.z-wave.me/ that the above query matches this
so, need to write a more complicated query that tries to tokenize attributes. (I've done that before so can copy from an earlier query.) |
New query that skips over whole attributes and only counts matches where the domain in
30578 matches of which 29676 are |
In an attempt to find the encoding of the page itself:
|
There are 455,868 pages in this dataset, so:
|
3108 pages, or ~0.7%, have |
The HTML-checker use counter for this shows ~5.5% of pages have a |
So now the question is whether it's worth being an error. I think it is as overriding the encoding from the document side means you have to do that everywhere and viewing the resource directly will fail, all of which seems suboptimal. But I guess the argument against would be that 5.5% is high to add additional noise over to the checker results? |
That shouldn’t stop us if we otherwise agree the right thing to do is to have the spec make it a non-conforming. For the sake of comparison, note that the checker stats show that at least 16% of the documents it’s being used to check contain a |
So I think there are several aspects against making it non-conforming:
What is the benefit for making charset for script non-conforming, other than consistency with link? |
Well for cross-origin scripts I'm not even sure we want to make this work (see other issue) since it's a minor security risk. By letting the script decode differently you might be able to extract data you wouldn't otherwise have access to. And for same-origin my rationale is what I gave above. |
@zcorpan also, if you only use utf-8 throughout as you should, the other reasons you gave go away (since you'd still inherit from the page). |
I've now INNER JOINed the tables, added a column with The first 3, as a preview:
What I don't have here is the encoding of the script/stylesheet, if specified with HTTP or BOM or |
26950 pages. Of those, pages where So there are 26536 pages where the page's encoding is not utf-8 but they include a cross-origin script with |
As far as document conformance goes, how about making the only allowed value be "utf-8"? The effect of that is: New conformance error:
No change:
|
Maybe we could do the following, in order:
|
That seems reasonable, but then we should probably close this PR as that requires something quite different. |
I’ve raised #3091 for those. The Encoding spec already has language that clearly states the requirement:
…so #3091 just brings us into conformance with what the Encoding spec already requires. |
If we don't go with this PR I should make sure I do move that comment and add the |
Interesting, I didn't realize Encoding disallowed e.g. |
FWIW, including the hyphen improves compatibility with Microsoft browsers. (Old ones at least, I'm too lazy to check the latest right now.) |
Fixes #3004.