Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 everywhere (or, more precisely, also on Windows) #98

Closed
11 tasks done
mosra opened this issue May 29, 2015 · 3 comments
Closed
11 tasks done

UTF-8 everywhere (or, more precisely, also on Windows) #98

mosra opened this issue May 29, 2015 · 3 comments

Comments

@mosra
Copy link
Owner

mosra commented May 29, 2015

Along with other recent attempts to make these libraries actually usable on Windows (MSVC-related fixes, workarounds etc.), I'd like to have first-class UTF-8 support also on this platform (because it basically "just works" everywhere else). As you may already know, there is no such thing as using UTF-8 in WINAPI directly, you either use ANSI or UTF-16. Both are ugly and nasty, but UTF-16 is clearly the way to go (although there is very little support for it in standard C++).

Recommended reading (the rest of the issue relies on decisions from this manifesto): http://utf8everywhere.org/

So, the goal is to have all text in const char/std::string and in UTF-8, not to force UTF-16 onto the users like in Qt, Java and elsewhere. The actual problems:

  • File/directory operations (Corrade::Utility::Directory) -- accepting UTF-8 filenames, converting them to UTF-16 internally and explicitly using *W Windows APIs (instead of these goddamn macros), for directory listing converting the returned UTF-16 strings back to UTF-8. mosra/corrade@a1061d0
  • Loading of plugins from UTF-8 directories (Corrade::PluginManager) mosra/corrade@2328ba3
  • UTF-8 environment variables in Corrade::Utility::Arguments mosra/corrade@49be6d0
  • UTF-8 filenames in Corrade::Utility::Configuration mosra/corrade@f2a9f1c
  • UTF-8 filenames passable to corrade_add_resource() mosra/corrade@b90f5b4
  • std::[io]fstream -- MSVC has non-standard constructor and open() that takes UTF-16 filename as a parameter, sadly nothing like that on MinGW (but there are some solutions). This is nothing clean, so the final solution is probably to move away from STL streams in importers etc. altogether and handle that some other way instead (which would be also far more portable to platforms w/o filesystem access). Done for Corrade in mosra/corrade@a1061d0 (ensuring nobody else uses fstream directly). obsoleted by Compilation time, CI time and executable size improvements #293, we moved to C I/O in mosra/corrade@c1a5eed (and it's UTF-8 aware as well)
    • Ensure no other library uses fstream or C I/O directly (plugins and 3rd party libs, especially) impossible to check, we'll be using our own file API if possible
  • UTF-8 argc and argv parameters. As with everything else, these "just work" everywhere except on Windows. I could probably do something similar to SDLmain, QtMain that would have int wmain() internally, converts the UTF-16 arguments to UTF-8 and calls the user-provided main() afterwards. I need to look into that more closely, because standard C++ allows some variations on main() such as implicit return statement and omitting argc/argv. Any tips on this would be greatly appreciated. MinGW doesn't support wmain(), but there is a workaround. -- New CorradeMain library corrade#37
  • Proper WinMain()/main() wrapper on Windows. Currently it's only possible to create console applications using the MAGNUM_APPLICATION_MAIN() macro, because it uses main(). Calling add_executable(something WIN32 ...) then makes the MSVC linker complain about missing WinMain(). -- New CorradeMain library corrade#37
  • Unicode standard output (std::wcout/std::wprintf etc.) that does not depend on some random codepage setting in the terminal (whose "clever" idea was that, anyway). Most of (all?) the output is currently handled with Corrade::Utility::Debug, so that means just reworking the internals to produce UTF-16 on Windows. Standard input is probably not an issue here (these libraries have different scope). AFAIK, this would mean redirecting to a file would make it UTF-16 encoded, which is definitely not wanted. Instead, set the output encoding to UTF-8 like in dart-lang/sdk@92b746c#diff-5c4ad2f03f9aac0f124bf4e6dba66156. Tools like CTest already enable that on their own.

As UTF-8/UTF-16 conversion would be needed only on Windows, I'll use WINAPI functions to do the conversion and won't make any public API for this, because it really shouldn't be needed anywhere else. I can't employ the (horrifically convoluted) std::codecvt etc., because (if I'm not mistaken) it is not supported in GCC < 4.9 (and probably also elsewhere), making it currently useless. (It's also awfully slow and the headers are bloated.)

Looking at the bigger picture, because it's apparently impossible to portably use std::fstream, std::cout etc. with proper UTF-8 filename/output on all platforms, I might as well get rid of the stream library altogether and have far lighter executables as a result.

@mosra
Copy link
Owner Author

mosra commented Jul 21, 2015

Just found out about the need for proper WinMain() entry point on MSVC when building non-console apps, updated the issue above.

Current workaround is to build applications with the ugly console window popping up in the background (i.e. without WIN32 in CMake's add_executable() call), but that's not something I would want in the long run, so this issue is now rather high priority.

@mosra
Copy link
Owner Author

mosra commented Feb 6, 2017

This was more painful than I thought (especially MinGW).

@mosra mosra removed the windows label Sep 26, 2018
@mosra mosra added this to the 2019.0b milestone Mar 15, 2019
@mosra
Copy link
Owner Author

mosra commented Jul 5, 2019

Finally closing as resolved, the remaining things are done in mosra/corrade#37 (soon to be merged) and mosra/corrade#49 (will get merged in ~2100, probably, if not later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant