-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #2061 unicode mangle #2062
Conversation
Two places where full unicode support was failing.
@@ -128,7 +128,7 @@ public static ConfigNode ToConfigNode(this HarddiskFile file, string nodeName) | |||
{ | |||
if (SafeHouse.Config.UseCompressedPersistence) | |||
{ | |||
node.AddValue("binary", EncodeBase64(content.String)); | |||
node.AddValue("binary", PersistenceUtilities.EncodeBase64(content.Bytes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class's local EncodeBase64() method tried to encode the string you give it, by mapping it to a byte array using a vanilla ASCII interpretation (see deleted code below in this PR diff output), which will mangle anything that needs to use the higher 8 bits of the 16 bit Unicode char.
This was unnecessary as well, since content
, being of type FileContent
, already has its own logic to do its own UTF-8 encoding, built-in. content.String
is the unicode string in chars, while content.Bytes
is the UTF-8 encoded byte array version of that string, and FileContent
does the work to ensure the bytes are UTF-8 rather than just a vanilla dump of the unicode 2-byte patterns.
Once I decided to use content.Bytes
instead of content.String
, that also made the entire reason for EncodeBase64()
go away (it was just to encode the string for you, which FileContent
already does, better.)
private static string EncodeBase64(string input) | ||
{ | ||
return PersistenceUtilities.EncodeBase64(Encoding.ASCII.GetBytes(input)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer needed to exist since the one place it was being called got deleted, and I don't want anyone ever using this, since it encodes with vanilla ASCII instead of with UTF-8.
c = (char)(e.character & 0x007f); // When doing this to solve issue #206 | ||
else | ||
c = e.character; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This silly workaround we had to put in a few years ago to work around a Unity on Linux bug, caused multi-byte Unicode to break now.
Reason for the original workaround code:
When mapping a vanilla ASCII character into Unicode, you're supposed to pad the upper bits with zero. So for example, 0x41 ("A") should become 0x0041. But on their Linux port, for some reason, while Unity did this correctly for printable ASCII chars, they did not do so for control ASCII chars (i.e. (0x01 (control-A), 0x08 (control-H, or 'tab'), etc). For them, Unity padded them with 1's instead of with 0's. So 0x0d (control-M, or "return key"), instead of giving a unicode value of 0x000d like it should, was giving a value of 0xff0d. This made the return key fail to work in kOS on Linux. So at the time I added this ugly hack to 'fix' it by simply masking off all the high bits.
But this broke reading keys that send non-ascii Unicode chars, because the high byte was being always clobbered no matter what, whether it was the problem case in the Linux port (0x000d becoming 0xff0d) or not.
At the time I knew it would have that effect, but didn't think much of it, since I knew we couldn't support all of unicode anyway, since we only had our hand-painted 7-bit ascii font support. By the time I removed that limitation by supporting system fonts, I had forgotten this trap was here.
The new version
Now it only clobbers the high byte when it's exactly all 1's, which should never occur for any normal printable character, as the Unicode from 0xff00 through 0xffff is considered "private use" that the Unicode consortium will never assign to any standard meaning.
@@ -112,6 +112,7 @@ public void Awake() | |||
GameEvents.onGUIApplicationLauncherUnreadifying.Add(RemoveButton); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are only here because PR #2058 hasn't been merged into develop yet, and I forgot it was still in my local copy. When that PR is merged, the diffs in this file should go away.
Fixes #2061
Two places where unicode support for characters with codes > 127 didn't quite work right. See in-line comments in the PR for details.