Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problems when using JNA Windows terminal implementation #133

Closed
stephan-gh opened this issue Jun 13, 2017 · 7 comments · Fixed by #168
Closed

Encoding problems when using JNA Windows terminal implementation #133

stephan-gh opened this issue Jun 13, 2017 · 7 comments · Fixed by #168
Assignees
Labels
Milestone

Comments

@stephan-gh
Copy link
Contributor

stephan-gh commented Jun 13, 2017

I've a really simple command line app that reads from the console using a LineReader. (Basically just the example code from https://github.com/jline/jline3/wiki/Using-line-readers).

public class ReaderTest {

    public static void main(String[] args) throws Exception {
        Terminal terminal = TerminalBuilder.terminal();
        LineReader reader = LineReaderBuilder.builder()
                .terminal(terminal)
                .build();

        while (true) {
            String line;
            try {
                line = reader.readLine("> ");
                terminal.writer().println(line);
            } catch (UserInterruptException e) {
                // Ignore
            } catch (EndOfFileException e) {
                return;
            }
        }
    }

}

I've now installed a standard English Windows 10 system and run the JAR with the JNA implementation. As long as I use ASCII characters everything is fine. However, when I switch the keyboard layout to e.g. Russian, all the Cyrillic characters get displayed as ?.

As far as I can see, the JNA implementation reads characters from the console using Kernel32.ReadConsoleInput. Everything is still correct at this point, JLine gets the unicode char from the Windows API and when I debug the code, the string builder in readConsoleInput has the correct Cyrillic character.

However, readConsoleInput encodes all the characters again, with the standard system encoding. On my English system, this is windows-1252 which doesn't support the Cyrillic characters. Consequently, it already gets encoded incorrectly at this point (to character 63, ?).

It seems like the actual InputStreamReader will later read them using the current code page of the console. However, if all characters are actually read as unicode it might be better to keep them that way (or encode to UTF-8) and instruct the reader to decode using the same encoding. I can imagine that there might be still problems with the output of these characters then (since that depends on the code page), but currently even the input fails no matter which code page I select in the console.

@stephan-gh
Copy link
Contributor Author

It seems like changing the system encoding using -Dfile.encoding=UTF-8 allows the characters to be read correctly, but currently output fails - even if I set the code page to 65001 (UTF-8).

@stephan-gh
Copy link
Contributor Author

stephan-gh commented Jun 13, 2017

The UTF-8 output seems to break due to the AnsiOutputStream. With the code page set to 65001 (UTF-8), the following code works correctly:

try (OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(FileDescriptor.out))) {
    out.write("ыыыыыы");
}

When I add the AnsiOutputStream it fails and prints ������������:

try (OutputStreamWriter out = new OutputStreamWriter(new AnsiOutputStream(new FileOutputStream(FileDescriptor.out)))) {
    out.write("ыыыыыы");
}

These characters are written with 2 bytes each so I think this is because the FilterOutputStream separates them and FileOutputStream.write(int) is called for each of them instead of FileOutputStream.writeBytes(...).

It works again if I add a BufferedOutputStream after the AnsiOutputStream (probably because it buffers them again and writes them combined):

try (OutputStreamWriter out = new OutputStreamWriter(new AnsiOutputStream(new BufferedOutputStream(new FileOutputStream(FileDescriptor.out))))) {
    out.write("ыыыыыы");
}

@stephan-gh
Copy link
Contributor Author

stephan-gh commented Jun 13, 2017

As a summary, my suggested fix for this issue would be to either make it the default, or add some kind of "UTF-8 option" that does the following:

  • Don't encode the input characters using the system charset but rather using UTF-8 (or UTF-16) and use the same for the input reader
  • Automatically set the output code page to 65001 (UTF-8) using SetConsoleOutputCP and configure the output stream to use UTF-8 (I'm not sure if this part could cause problems in some environments)
  • Add a BufferedOutputStream behind the WindowsAnsiOutputStream so the UTF-8 characters are displayed properly in the console (or find another way to write them together?)

With the output code page manually set to 65001, -Dfile.encoding=UTF-8 and the BufferedOutputStream behind the WindowsAnsiOutputStream, input of the Cyrillic characters worked fine with the current version of JLine.

I could be entirely wrong about this issue, so let me know what you think.

@gnodet
Copy link
Member

gnodet commented Jun 15, 2017

Could you check the above patch ? If it works for you, I'll merge it to master.

@stephan-gh
Copy link
Contributor Author

stephan-gh commented Jun 15, 2017

@gnodet Thanks! Your fix solves the problem with the input encoding. The characters are now read correctly, but are not displayed correctly in the console due to console output encoding problems:

  • JLine's check to get the charset from the current code page doesn't work for code page 65001 (UTF-8) because it's not registered as cp65001 or ms65001 in Java as far as I can see (it's just UTF-8). JLine falls back to the system's standard encoding, which doesn't work for the Cyrillic characters.
  • The other two points I mentioned in my summary above still apply, it would be nice if JLine could set the console output code page automatically, otherwise users will have to type chcp 65001 before starting the application.

@stephan-gh
Copy link
Contributor Author

@gnodet Thanks for the additional changes. I managed to get it working and have added two small comments to your commit.

gnodet added a commit to gnodet/jline3 that referenced this issue Jun 16, 2017
gnodet added a commit to gnodet/jline3 that referenced this issue Jun 16, 2017
@stephan-gh
Copy link
Contributor Author

@gnodet Latest commit is working correctly now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants