-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode characters in filenames are not preserved #1443
Comments
Filed as internal issue #USD-6560 |
Hello! When I originally filed this issue it sounded like Pixar was already working on utf8 support. I just wanted to check in to see if there is a plan for that. Would we expect to see it in 21.04, for instance? Thanks! |
Hi @rstelzleni ! Yes, we are working on it, and had hoped to have it ready for 21.05; however, some pretty intense production priorities have intervened, which pushes this out to 21.08. |
Sounds good, thanks Spiff. If there are changes that we could cherry pick and test out sooner we'd be happy to get some testing done. I didn't see anything in the current public repo. We could also help with implementation under Pixar's direction if that helps, just let us know! |
Hi, we've also had problems with non-ASCII characters in paths when using this library on Windows. Enabling the UTF-8 codepage by default as described here seems to fix the problems. Although this will affect the entire application. For our case this was acceptable. Hope this helps. |
I'm excited to see the new support rolling out soon! I did a few builds off of dev and tested out opening files on different platforms. I can open and save files with utf8 names, and containing utf8 characters on Mac and Linux, and so far everything seems to work. On Windows, files containing utf8 characters open correctly and the contents save out correctly, but I've found that I can't open files with unicode characters in the filename. I tried opening directly, and also as a reference. I have mostly been trying with python scripts and in usdview. I suspect it might work if I did what DDoS suggested in the previous comment. Have you been able to open files on windows with unicode filenames? I didn't see any in the tests, but I might have just missed them. |
Thanks to @gitamohr 's changes asset paths can now indeed be utf-8, and that's great! Unfortunately the changes are not quite sufficient to actually open USD files that contain unicode characters in their file names on Windows (it seems fine on linux!). On a recent windows box (and as described here: https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page) one can patch The proper is probably to define @spiffmon could you tell us if this is something that you are already planning on doing? I've just started looking at the issue, and it seem doable but there might be performance concerns when converting all filenames using this snippet: #include <locale>
#include <codecvt>
std::wstring GetWideStr(const std::string& s){
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
return converter.from_bytes(s);
} |
Modern Windows has the ability to take UTF-8 in commands like fopen().
There is absolutely no need any more to call the _w versions or ever
convert UTF-8 to something else.
…On Mon, Jul 26, 2021 at 1:13 AM Aloys Baillet ***@***.***> wrote:
Thanks to @gitamohr <https://github.com/gitamohr> 's changes asset paths
can now indeed be utf-8, and that's great!
Unfortunately the changes are not quite sufficient to actually open USD
files that contain unicode characters in their file names on Windows (it
seems fine on linux!).
On a recent windows box (and as described here:
https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page)
one can patch python.exe (using the described mt -manifest
utf8manifest.xml -outputresource:python.exe;#1 command) and get usdcat to
also work on referenced unicode paths, but this manifest-based "code page"
change is not very portable.
The proper is probably to define UNICODE and fix all the win32 calls to
pass wchar_t when using the win32 functions defined with the W suffix
instead of the A suffix (or _wfopen instead of fopen).
@spiffmon <https://github.com/spiffmon> could you tell us if this is
something that you are already planning on doing? I've just started looking
at the issue, and it seem doable but there might be performance concerns
when converting all filenames using this snippet:
#include <locale>
#include <codecvt>
std::wstring GetWideStr(const std::string& s){
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
return converter.from_bytes(s);
}
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1443 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJMCEKVXYN2I5HAQUMJFQDTZUKJPANCNFSM4XDSDIFA>
.
|
@gitamohr will be following up on this shortly, @aloysbaillet , and thanks for the info, @spitzak ! |
Thanks @spitzak , indeed the current code can be used successfully if the windows ANSI code page is changed either globally or using the manifest on specific executables, or |
…ios#1443) - Change LICENSE in spirvTransforms class - Remove incorrect parameters in CMakeLists for building glf - Update OIT tests baselines - Add `HDX_API`` api to wboitResolveTask class - Small nits on definition order in oitRenderTask - Add missing `HDX_API` in renderTask
Description of Issue
Recently we've run into some issues with USD files that have non-ascii characters in their filenames. We initially found that a USD file with an umlaut in the filename opened fine on Linux but failed to open on Windows. I noticed that ArchOpenFile calls fopen, which might explain that difference. On Linux fopen can accept a utf8 filename, but the Windows version doesn't.
Based on that I wasn't sure if USD is expected to handle utf8 filenames or not. I did some testing, and it seems like you can use utf8 filenames in sublayer lists and in references, and things seem to work. However, if I create such a file by hand, open it and save it out I find that any special characters get stripped. For example, this:
to this
I didn't test this with string valued attributes or asset paths in USD files.
I posted this to the mailing list, and Alex Mohr suggested filing an issue for tracking this case.
Steps to Reproduce
System Information (OS, Hardware)
Tested on Windows 10 and Ubuntu 20.04
Package Versions
USD 21.02
The text was updated successfully, but these errors were encountered: