Fixes #2061 unicode mangle #2062

Dunbaratu · 2017-07-06T21:12:43Z

Fixes #2061

Two places where unicode support for characters with codes > 127 didn't quite work right. See in-line comments in the PR for details.

Two places where full unicode support was failing.

Dunbaratu · 2017-07-06T21:25:30Z

src/kOS/Persistence/PersistenceExtensions.cs

@@ -128,7 +128,7 @@ public static ConfigNode ToConfigNode(this HarddiskFile file, string nodeName)
            {
                if (SafeHouse.Config.UseCompressedPersistence)
                {
-                    node.AddValue("binary", EncodeBase64(content.String));
+                    node.AddValue("binary", PersistenceUtilities.EncodeBase64(content.Bytes));


This class's local EncodeBase64() method tried to encode the string you give it, by mapping it to a byte array using a vanilla ASCII interpretation (see deleted code below in this PR diff output), which will mangle anything that needs to use the higher 8 bits of the 16 bit Unicode char.

This was unnecessary as well, since content, being of type FileContent, already has its own logic to do its own UTF-8 encoding, built-in. content.String is the unicode string in chars, while content.Bytes is the UTF-8 encoded byte array version of that string, and FileContent does the work to ensure the bytes are UTF-8 rather than just a vanilla dump of the unicode 2-byte patterns.

Once I decided to use content.Bytes instead of content.String, that also made the entire reason for EncodeBase64() go away (it was just to encode the string for you, which FileContent already does, better.)

Dunbaratu · 2017-07-06T21:26:52Z

src/kOS/Persistence/PersistenceExtensions.cs

-        private static string EncodeBase64(string input)
-        {
-            return PersistenceUtilities.EncodeBase64(Encoding.ASCII.GetBytes(input));
-        }


This no longer needed to exist since the one place it was being called got deleted, and I don't want anyone ever using this, since it encodes with vanilla ASCII instead of with UTF-8.

Dunbaratu · 2017-07-06T21:41:45Z

src/kOS/Screen/TermWindow.cs

+                    c = (char)(e.character & 0x007f); // When doing this to solve issue #206
+                else
+                    c = e.character;
+


This silly workaround we had to put in a few years ago to work around a Unity on Linux bug, caused multi-byte Unicode to break now.

Reason for the original workaround code:

When mapping a vanilla ASCII character into Unicode, you're supposed to pad the upper bits with zero. So for example, 0x41 ("A") should become 0x0041. But on their Linux port, for some reason, while Unity did this correctly for printable ASCII chars, they did not do so for control ASCII chars (i.e. (0x01 (control-A), 0x08 (control-H, or 'tab'), etc). For them, Unity padded them with 1's instead of with 0's. So 0x0d (control-M, or "return key"), instead of giving a unicode value of 0x000d like it should, was giving a value of 0xff0d. This made the return key fail to work in kOS on Linux. So at the time I added this ugly hack to 'fix' it by simply masking off all the high bits.

But this broke reading keys that send non-ascii Unicode chars, because the high byte was being always clobbered no matter what, whether it was the problem case in the Linux port (0x000d becoming 0xff0d) or not.

At the time I knew it would have that effect, but didn't think much of it, since I knew we couldn't support all of unicode anyway, since we only had our hand-painted 7-bit ascii font support. By the time I removed that limitation by supporting system fonts, I had forgotten this trap was here.

The new version

Now it only clobbers the high byte when it's exactly all 1's, which should never occur for any normal printable character, as the Unicode from 0xff00 through 0xffff is considered "private use" that the Unicode consortium will never assign to any standard meaning.

Dunbaratu · 2017-07-06T21:45:25Z

src/kOS/Screen/KOSToolbarWindow.cs

@@ -112,6 +112,7 @@ public void Awake()
            GameEvents.onGUIApplicationLauncherUnreadifying.Add(RemoveButton);


The changes in this file are only here because PR #2058 hasn't been merged into develop yet, and I forgot it was still in my local copy. When that PR is merged, the diffs in this file should go away.

Dunbaratu added 2 commits June 25, 2017 03:09

Fixes KSP-KOS#2056 - adds config save to toolbar menu

ceabc75

Fixes KSP-KOS#2061 unicode bugs

1f1b803

Two places where full unicode support was failing.

Dunbaratu commented Jul 6, 2017

View reviewed changes

hvacengi self-assigned this Jul 31, 2017

hvacengi merged commit 1f1b803 into KSP-KOS:develop Aug 2, 2017

Dunbaratu added this to the v1.1.2.0 milestone Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #2061 unicode mangle #2062

Fixes #2061 unicode mangle #2062

Dunbaratu commented Jul 6, 2017

Dunbaratu Jul 6, 2017

Dunbaratu Jul 6, 2017

Dunbaratu Jul 6, 2017 •

edited

Loading

Dunbaratu Jul 6, 2017 •

edited

Loading

		@@ -112,6 +112,7 @@ public void Awake()
		GameEvents.onGUIApplicationLauncherUnreadifying.Add(RemoveButton);

Fixes #2061 unicode mangle #2062

Fixes #2061 unicode mangle #2062

Conversation

Dunbaratu commented Jul 6, 2017

Dunbaratu Jul 6, 2017

Choose a reason for hiding this comment

Dunbaratu Jul 6, 2017

Choose a reason for hiding this comment

Dunbaratu Jul 6, 2017 • edited Loading

Choose a reason for hiding this comment

Reason for the original workaround code:

The new version

Dunbaratu Jul 6, 2017 • edited Loading

Choose a reason for hiding this comment

Dunbaratu Jul 6, 2017 •

edited

Loading

Dunbaratu Jul 6, 2017 •

edited

Loading