Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML and JSON archives do not support serialization of wide strings but should #95

Open
schrpe opened this issue Apr 22, 2014 · 7 comments

Comments

@schrpe
Copy link

schrpe commented Apr 22, 2014

The binary archives do support wide character string (wstring, wchar_t), but XML and JSON archives do not.

@AzothAmmo
Copy link
Contributor

We could probably patch this in to cereal by creating a templated JSON/XML archive and using a typedef to keep the current interface the same (giving the templated version some similar but distinct name).

I'm tempted to just roll this into the xml/json overhaul as part of supporting streaming though, since we may be starting with essentially a blank slate there.

@tarqd
Copy link

tarqd commented May 20, 2014

This is really tough because wide strings don't define what encoding they are in, and the JSON spec requires them to be unicode (chromium and folly only support UTF8 for parsing under the hood!). Really C++ needs a sane unicode string implementation haha.

Edit: On linux it's safe to assume they are in unicode already, however on windows it depends on the current codepage

@mattyclarkson
Copy link
Contributor

C++11 does provide a sane unicode string implementation:

const wchar * const wide = L'unicode string';
const char * const utf8 = u8'unicode string';
const char16 * const utf16 = u'unicode string';
const char32 * const utf32 = U'unicode string';

Ideally, you shouldn't be using wchar at all in C++11, UTF8 should provide everything needed.

@patlecat
Copy link

I also ran into problems by using wchar/wstring when using console AND file output under Windows. The file written with a wstring would be 0 Bytes while using unicode characters with std::string works perfectly. Whereas a unicode output to the console works best with wstring and wcout once you choose the right font for the shell you're using. Ai ai caramba 🌵

@stevehickman
Copy link

Some common libraries, like boost::filesystem, use wstring/wchar internally. In these cases, it's unavoidable.

@tareqsiraj
Copy link

And windows unicode builds uses wchar_t. So essentially we will have to convert all wstring to string before serializing (not knowing much about unicode encodings).

@Trass3r
Copy link

Trass3r commented Sep 10, 2015

Yeah and it should stay there.
We only use wchar_t in encapsulated code dealing with the windows API directly and assume utf8 everywhere else incl std::string.

stevehickman pushed a commit to stevehickman/cereal that referenced this issue May 3, 2016
… (input / output). When new archive sets are added, we need only update the archive_type_list in common.hpp

Note that filesystem cannot yet use this approach because it requires wchar, which XML and JSON do not yet support (Issue USCiLab#95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants