Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2061 unicode mangle #2062

Merged
merged 2 commits into from
Aug 2, 2017

Conversation

Dunbaratu
Copy link
Member

Fixes #2061

Two places where unicode support for characters with codes > 127 didn't quite work right. See in-line comments in the PR for details.

@@ -128,7 +128,7 @@ public static ConfigNode ToConfigNode(this HarddiskFile file, string nodeName)
{
if (SafeHouse.Config.UseCompressedPersistence)
{
node.AddValue("binary", EncodeBase64(content.String));
node.AddValue("binary", PersistenceUtilities.EncodeBase64(content.Bytes));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class's local EncodeBase64() method tried to encode the string you give it, by mapping it to a byte array using a vanilla ASCII interpretation (see deleted code below in this PR diff output), which will mangle anything that needs to use the higher 8 bits of the 16 bit Unicode char.

This was unnecessary as well, since content, being of type FileContent, already has its own logic to do its own UTF-8 encoding, built-in. content.String is the unicode string in chars, while content.Bytes is the UTF-8 encoded byte array version of that string, and FileContent does the work to ensure the bytes are UTF-8 rather than just a vanilla dump of the unicode 2-byte patterns.

Once I decided to use content.Bytes instead of content.String, that also made the entire reason for EncodeBase64() go away (it was just to encode the string for you, which FileContent already does, better.)

private static string EncodeBase64(string input)
{
return PersistenceUtilities.EncodeBase64(Encoding.ASCII.GetBytes(input));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer needed to exist since the one place it was being called got deleted, and I don't want anyone ever using this, since it encodes with vanilla ASCII instead of with UTF-8.

c = (char)(e.character & 0x007f); // When doing this to solve issue #206
else
c = e.character;

Copy link
Member Author

@Dunbaratu Dunbaratu Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silly workaround we had to put in a few years ago to work around a Unity on Linux bug, caused multi-byte Unicode to break now.

Reason for the original workaround code:

When mapping a vanilla ASCII character into Unicode, you're supposed to pad the upper bits with zero. So for example, 0x41 ("A") should become 0x0041. But on their Linux port, for some reason, while Unity did this correctly for printable ASCII chars, they did not do so for control ASCII chars (i.e. (0x01 (control-A), 0x08 (control-H, or 'tab'), etc). For them, Unity padded them with 1's instead of with 0's. So 0x0d (control-M, or "return key"), instead of giving a unicode value of 0x000d like it should, was giving a value of 0xff0d. This made the return key fail to work in kOS on Linux. So at the time I added this ugly hack to 'fix' it by simply masking off all the high bits.

But this broke reading keys that send non-ascii Unicode chars, because the high byte was being always clobbered no matter what, whether it was the problem case in the Linux port (0x000d becoming 0xff0d) or not.

At the time I knew it would have that effect, but didn't think much of it, since I knew we couldn't support all of unicode anyway, since we only had our hand-painted 7-bit ascii font support. By the time I removed that limitation by supporting system fonts, I had forgotten this trap was here.

The new version

Now it only clobbers the high byte when it's exactly all 1's, which should never occur for any normal printable character, as the Unicode from 0xff00 through 0xffff is considered "private use" that the Unicode consortium will never assign to any standard meaning.

@@ -112,6 +112,7 @@ public void Awake()
GameEvents.onGUIApplicationLauncherUnreadifying.Add(RemoveButton);
Copy link
Member Author

@Dunbaratu Dunbaratu Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are only here because PR #2058 hasn't been merged into develop yet, and I forgot it was still in my local copy. When that PR is merged, the diffs in this file should go away.

@hvacengi hvacengi self-assigned this Jul 31, 2017
@hvacengi hvacengi merged commit 1f1b803 into KSP-KOS:develop Aug 2, 2017
@Dunbaratu Dunbaratu added this to the v1.1.2.0 milestone Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants