-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WG21 P1854: Source to Execution encoding conversion should not lead to loss of information #50
Comments
The diagnostic for that example is awful. Instead of stating that the character lacks representation in the presumed execution encoding, it states that it is "invalid" (whatever that means), or is an "incomplete multibyte or wide character". I have no idea what an "incomplete wide character" might be. Substituting a
The linked godbolt example only demonstrates behavior for a single compiler. The proposed change is not existing practice for some other compilers. In particular, the Microsoft compiler will silently substitute a replacement character. The claim that the proposed change reflects the behavior of "most compilers" is unsubstantiated. That being said, I think I can get behind this proposed change. Implementations can always offer an extension to substitute replacement characters in the (very few) cases where that is desirable. |
TBH i wasn't able to make clang accept anything but utf8 as input encoding |
LLVM Clang (and common derivatives like Apple Clang and Android Clang) only support UTF-8. There are derivatives that do support other encodings though (e.g., the z/OS Clang ports). |
P1854 was submitted with a proposed fix for this issue and was discussed by SG16 in Belfast. This is now waiting on an updated paper. |
This issue is now tracked by cplusplus/papers#608. |
@cor3ntin I don't think we ever polled Proposal 7 from P2178, and there doesn't seem to be a current paper that plugs this silent data loss hole. Do we need a new paper, or a new revision of P1854? |
P1854 will be revised
…On Thu, Sep 16, 2021, 16:41 Peter TB Brett ***@***.***> wrote:
@cor3ntin <https://github.com/cor3ntin> I don't think we ever polled
Proposal 7 from P2178, and there doesn't seem to be a current paper that
plugs this silent data loss hole. Do we need a new paper, or a new revision
of P1854?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKX764CFESHCI5JYZ42QETUCH6YFANCNFSM4II4SRSQ>
.
|
When converting as string literal or wide string literal (or character) from the source to execution encoding, it is implementation defined how non-representable characters are handled, which can lead to loss of data.
In practice, most compilers make that ill-formed https://godbolt.org/z/SlhCdr
The standard should match existing practice and not encourage implementation to be able to
modify the meaning of string literals
http://eel.is/c++draft/lex#phases-1.5
Each basic source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set ([lex.ccon], [lex.string]); if there is no corresponding member,
it is converted to an implementation-defined member other than the null (wide) characterthe program is ill-formed.Note: the above paragraph needs further modifications as per #46
The text was updated successfully, but these errors were encountered: