-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(#456): fix default charset problem #457
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: GodMeowIceSun <icesun@icesun.cn>
fcc3270
to
28e4885
Compare
@spassarop - Have you looked at this issue/proposed fix yet? If there is a real problem here it would be nice to fix this and include it in the release we are close to getting done related to the neko-html fix. |
ISO-8859-1 is the default encoding for Windows, but just about every other
modern OS uses UTF-8. It can especially bite you in the butt when using
String.getBytes(). I recommend changing it to use UTF-8.
See
https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1#7048774
for details of how they are different.
…-kevin
On Wed, Jun 5, 2024, 11:52 PM GodMeowIceSun ***@***.***> wrote:
@GodMeowIceSun <https://github.com/GodMeowIceSun> requested your review
on: #457 <#457> fix(#456
<#456>): fix default charset
problem.
—
Reply to this email directly, view it on GitHub
<#457 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO6PG4LLTCSVCS2OOT3FNTZF7MG3AVCNFSM6AAAAABISIQENCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGA3DANJQGU4TAMI>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
I agree with Kevin. Probably the default Windows encoding was used on
policy XML because the content of that file was unlikely to have characters
beyond that encoding.
I am not sure why there were error messages in Chinese but were not tested
too see if the text would show correctly. I mean, if the display was tested
it would look obviously broken.
Probably if an environment does not support UTF-8, AntiSamy usage should
not be the main issue.
Then, I share Kevin suggestion to update the encoding to UTF-8 on this code
change and test it.
…On Thu, 6 Jun 2024 at 02:23 Kevin W. Wall ***@***.***> wrote:
ISO-8859-1 is the default encoding for Windows, but just about every other
modern OS uses UTF-8. It can especially bite you in the butt when using
String.getBytes(). I recommend changing it to use UTF-8.
See
https://stackoverflow.com/questions/7048745/what-is-the-difference-between-utf-8-and-iso-8859-1#7048774
for details of how they are different.
-kevin
On Wed, Jun 5, 2024, 11:52 PM GodMeowIceSun ***@***.***>
wrote:
> @GodMeowIceSun <https://github.com/GodMeowIceSun> requested your review
> on: #457 <#457> fix(#456
> <#456>): fix default charset
> problem.
>
> —
> Reply to this email directly, view it on GitHub
> <#457 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAO6PG4LLTCSVCS2OOT3FNTZF7MG3AVCNFSM6AAAAABISIQENCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGA3DANJQGU4TAMI>
> .
> You are receiving this because your review was requested.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#457 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHL3BMPHIOCU4ZAWHBAEX73ZF7W35AVCNFSM6AAAAABISIQENCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJRGQZTKMRSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Perhaps you can take a look at the document I referenced, which proves in reverse that the default encoding of ResourceBundle in the original implementation of Jdk8 was ISO-8859-1, while our project is compatible with Java 8+, and the internationalization method used is ResourceBundle. Perhaps this is the core reason for the problem. I don't think directly modifying it to UTF-8 can solve the problem. Perhaps different versions of JDK need to be treated differently or the Java version should use ISO-8859-1 as the default encoding, which is also a temporary solution |
@GodMeowIceSun wrote:
I did read through this. I think the worst case scenario here is that someone may have to convert the ResourceBundle from ISO-8859-1 to UTF-8. But every Java instance that I've ever worked with, dating back to JDK 1.1, has supported UTF-8, so I don't think this will be a problem, and it's only a 1 line change to check it. If it's a problem, we can convert the encoding for the ResourceBundle. But if it's all in ASCII, we shouldn't even need to do that. I certainly don't think a different minimal JDK version needs to be updated to handle this though. |
I'm not recommending it, but I suppose if you really wanted to, you could tweak the code to use a character set based on a system property and just have it default to UTF-8 if not specified. The one thing I do know if UTF-8 will always be available, regardless of whatever language preferences. I suspect that ISO-8859-1 usually would be, but I'm not sure that is 100% the case if you were using Chinese or Kanji, etc. But UTF-8 is used for other internal encodings. E.g., it's often used with cryptography related code, otherwise going between Windows and *nix often causes problems. For example if shows up in these classes:
I think it is used when stuff needs to be serialized / deserialized so it will work across operating systems, but I can't swear to that as it's been 15-20 years since I looked at that code. That said, ISO-8859-1 would almost certainly work, but it's just the natural choice that UTF-8 is. Just my $.02 though. In ESAPI, we use UTF-8 for everything, but this is your choice. I just think UTF-8 is more neutral whereas ISO-8859-1 has a Western European slant. |
@kwwall - Do you suggest I close this as won't fix? |
I suppose the correct thing to do is for someone to correct all the AntiSamy properties files and XML files as per the Oracle JDK 9 document that @GodMeowIceSun referenced, and use UTF-8 instead of ISO-8859-1. (And if that doesn't work, use a system property like I suggested earlier.) Ideally, that someone who would change this would be the one who has submitted the issue, but that doesn't seem to be happening here. However, it seems likely that others are impacted. If I were you, I'd give the PR submitter another chance to make these changes. Set a deadline and if it's not met close this PR. But I don't think I would close issue #456 as the issue seems legitimate. If you end up closing this PR because some date is past, then I probably would slap a 'Help Wanted' label and/or 'Good First Issue' label on the issue and set the priority to Low and just leave it open. |
@GodMeowIceSun - If you can implement this PR per @kwwall description, we can potentially merge those changes in. But as is, we cannot. Do you think you can make the changes as he describes? |
the fix of this issue
#456