Skip to content
This repository has been archived by the owner on Jul 30, 2019. It is now read-only.

Accept-charset attribute of form element maybe have a bug [was: Issue #1421] #1523

Closed
siusin opened this issue Jul 20, 2018 · 0 comments
Closed

Comments

@siusin
Copy link
Contributor

siusin commented Jul 20, 2018

Re-file Issue #1421 from programmer3000.

from programmer3000:

Accept-charset attribute of <form> element maybe have a bug. As scecification said "the value must be an ordered set of unique space-separated tokens". But how is it possible if server can choose just one encoding character set. If write accept-charset: "utf-8 utf-16 windows-1251", which value will choose server?

from edent:

This is re https://www.w3.org/TR/html53/single-page.html#element-attrdef-form-accept-charset

It is my understanding that by default the browser will choose the charset which is the same as the page.
If a list is presented, the browser can choose whichever one it wants. So if Firefox from 2018 is in use in the year 3000 and sees accept-charset=\"martian UTF-3000 UTF-8 then it will choose the one it recognises.
I wonder if this should be removed given that we now require UTF-8 for everything (#1039)

from programmer3000:

@edent and where is the logic? Html 5.3 said "The accept-charset content attribute gives the character encodings that are to be used for the submission". Where from site-builder know which is encoding was with form submission? But site-biulder set accept-charset without understanding encoding charset of site user at all. I think it should be removed from HTML specification as added mistakenly by incompetent specialists. Even if browser could interpretate summission encoding so for what ordered set of accept-charset for themself server?

from Alohci:

@programmer3000 The logic is section "4.10.21.5. Selecting a form submission encoding". My reading of this is that accept-charset is a preference list. The browser must use the first encoding in the list that it supports, or UTF-8 if it supports none in the provided list, or the page's charset if there is no accept-charset attribute.

from edent:

@programmer3000 You have been repeatedly reminded that this site has a code of conduct
https://github.com/w3c/html/blob/master/docs/conduct.md

Treat each other with respect, professionalism, fairness, and sensitivity to our many differences and strengths, including in situations of high pressure and urgency.

from LJWatson:

@programmer3000 we have removed your access to the W3C on Github, because despite previous warnings that comments like the one mentioned above, and others, are not appropriate, you have continued to make them.
If you would like to discuss this, you are welcome to contact us by email to team-webplatform@w3.org

from chaals:

I think we can close this issue as invalid, based on the explanation @Alohci provided above.

from testerioing:

in fact, how does this work the HTML editors can explain? if I as a web master myself specify the encoding so that it came back to me on the server what's the idea?

@Alohci how can the browser support or not support the encoding? It takes a set of bytes and renders such encoding that it will be indicated. "must use the first encoding in the list that it support" - how can it support the first or not the first? he will support any. It also accepts a stream of bytes and renders it so that it is pointed to it, i.e. not the first one that supports but the only one. it turns out there is no sense in the list of encodings that you choose simply the first one. Also, you could specify in meta charset = \"list_of_encoding\". In addition, the input initiative comes from the client and not the server, so it's pointless to include a list of encodings if only one single

html-editors please do a working standard because we are a web masters constantly referring to the standard and it's difficult for us to create websites when there are inaccuracies and errors in the standard

from edent:

Simply, suppose your web server is set to only understand Latin-1.
I send you £ encoded in UTF-8. This is sent in binary as 11000010 10100011.
Because your server doesn't understand UTF-8, it interprets it as £.
In this case, it makes sense for you to say in your HTML <form accept-charset=\"ISO-8859-1\" ...
This will tell the browser to change the encoding of the data it sends your server. In this case £ will be sent as 10100011
There's a full explanation at https://en.wikipedia.org/wiki/Mojibake#Russian_and_other_Cyrillic_alphabets

from testerioing:

@edent so it means that the browser before sending converts into the encoding of accept-charset? please clarify this in the standard because there may be confusion with "encode in" or "convert to"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants