Require UTF-8 For Windows? #2220
Replies: 7 comments 1 reply
-
What is the context? My understanding is that netCDF has assumed UTF-8 for file object names (variables, attributes, etc.) since version 3.6.3, at least 14 years ago. |
Beta Was this translation helpful? Give feedback.
-
At the spec level yes. But in practice, it was not doing that for windows. |
Beta Was this translation helpful? Give feedback.
-
Okay. For file and path names, definitely support UTF-8 by default on Windows. However, if easy, maintain some kind of legacy option to support UTF-16 or CP-1252. This is only a suggestion, I do not personally need this. For data in character and string data types, please support transparent storage with no encoding restrictions. I think this has been the status quo since the start of netCDF. The actual encoding is decided by application context or external labeling. The modern assumption is UTF-8, but it should not be required. |
Beta Was this translation helpful? Give feedback.
-
The issue of what the "char" and "string" types mean has always been ambiguous. |
Beta Was this translation helpful? Give feedback.
-
I agree, although this has never been formally decided. Some years ago there was an |
Beta Was this translation helpful? Give feedback.
-
By requiring UTF-8, aren't you essentially deprecating support for any older version of Windows? If that's the case, it would be really helpful to know how many users this would impact. |
Beta Was this translation helpful? Give feedback.
-
Currently, turning on full utf8 support in Windows 10 is optional and off by default. |
Beta Was this translation helpful? Give feedback.
-
This is a followup to Issue #2190
It used to be that Windows, by default supported its own CP-1252
character set for 8-bit characters. The 1252 character set is similar
to, but not identical with, the ISO-Latin-1 character set.
At the same time, Windows also supported a wide-character set
(utf16-LE) capability; somewhat like Java.
As of a couple of years ago, Windows began to support (almost)
the use of the utf8 character set. It now advises new
applications to use utf8 instead of either utf16 or cp-1252.
This is discussed here:
Technically, this is still Beta, but it seems pretty solid at
this point. The way to enable it is as follows:
"Beta: Use Unicode UTF-8 for worldwide language support"
This still causes some problems because utf-16 cannot support
the whole utf character set, while utf8 can support it. So there
are some Asian and other characters that cause failures.
In any case, the question I pose is this:
The important consequence is that the 1252 character set is deprecated
and users can use it, but we (NetCDF-C) will make no attempt to support it.
Beta Was this translation helpful? Give feedback.
All reactions