Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SdfAssetPath fails to validate utf8 encoded character '響' #2560

Closed
shigeno-y opened this issue Jul 31, 2023 · 2 comments
Closed

SdfAssetPath fails to validate utf8 encoded character '響' #2560

shigeno-y opened this issue Jul 31, 2023 · 2 comments

Comments

@shigeno-y
Copy link

Description of Issue

When I pass string contains utf8 character u8'響', (0xE9, 0x9F, 0xBF), to SdfAssetPath, the validation fails and reports error.

asset path SdfAssetPath accepts?
asset test = @響@ no
asset test = @path/to/響@ no
asset test = @あ@ yes
asset test = @あ響@ yes
asset test = @あ/響@ yes
asset test = @あ/path/to/響@ yes
asset test = @⚡響@ yes
asset test = @⚡/響@ yes
asset test = @⚡/path/to/響@ yes

So I think this happens when SdfAssetPath get string contains u8'響' led by only single-byte characters.
I'm not sure other character triggers this issue.

Steps to Reproduce

via usda file

  1. Create new usda file (bad.usda) such as:
    #usda 1.0
    
    def "foo" {
        asset test = @響@
    }
    
  2. Load this usda someway.
    I tried with usdview and usdcat
  3. They fail to load usda file with error message:
    Invalid asset path string -- character 1:
    syntax error at '@響@' in </foo.test> on line 4 in file bad.usda
    

via tiny c++ code

#include <iostream>
#include <vector>

#include "pxr/pxr.h"
#include "pxr/usd/sdf/assetPath.h"

int
main()
{
    std::cout << "-------- to be fail --------" << std::endl;
    {
        std::vector<std::u8string> tests;
        tests.emplace_back(u8"@響@");
        tests.emplace_back(u8"@path/to/響@");
        for (const auto& t : tests)
        {
            std::cout << "\t" << reinterpret_cast<const char*>(t.c_str()) << std::endl;
            pxr::SdfAssetPath path{ reinterpret_cast<const char*>(t.c_str()) };
        }
    }

    std::cout << "-------- to be pass --------" << std::endl;
    {
        std::vector<std::u8string> tests;
        tests.emplace_back(u8"@あ@");
        tests.emplace_back(u8"@あ響@");
        tests.emplace_back(u8"@あ/響@");
        tests.emplace_back(u8"@あ/path/to/響@");
        tests.emplace_back(u8"@⚡響@");
        tests.emplace_back(u8"@⚡/響@");
        tests.emplace_back(u8"@⚡/path/to/響@");
        for (const auto& t : tests)
        {
            std::cout << "\t" << reinterpret_cast<const char*>(t.c_str()) << std::endl;
            pxr::SdfAssetPath path{ reinterpret_cast<const char*>(t.c_str()) };
        }
    }
    return 0;
}

This code will print:

-------- to be fail --------
        @響@
Coding Error: in _ValidateAssetPathString at line 132 of path\to\OpenUSD\pxr\usd\sdf\assetPath.cpp -- Invalid asset path string -- character 2:

        @path/to/響@
Coding Error: in _ValidateAssetPathString at line 132 of path\to\OpenUSD\pxr\usd\sdf\assetPath.cpp -- Invalid asset path string -- character 10:

-------- to be pass --------
        @あ@
        @あ響@
        @あ/響@
        @あ/path/to/響@
        @⚡響@
        @⚡/響@
        @⚡/path/to/響@

System Information (OS, Hardware)

Windows 10 22H2 (pro) with VisualStudio 2019
Windows 11 22H2 (ent) with VisualStudio 2022

Package Versions

v23.08

Build Flags

I build usd with plain build_scripts/build_usd.py path/to/build.

Best,

@jesschimein
Copy link
Contributor

Filed as internal issue #USD-8548

@syoyo
Copy link

syoyo commented Aug 1, 2023

There is a bug in _ReadUTF8

_ReadUTF8(char const *&cp, std::string *errMsg)

The sign bit is carried at bitwise operations for some UTF-8 char(including "響") and returns -1 or negative value here:

Adding uchar casting property to bitwise operations should solve the issue.

https://gist.github.com/syoyo/42fe2874cc6f86d0f281f1f523cc98ab

Also, _ReadUTF8 is vulnerable to corrupted char array input.

It'd be better to use simpler & safer utf8 string to utf8 code converter like this(used in the above gitst snippet): https://github.com/lighttransport/japanese-normalizer-cpp/blob/d23b9ec71b74a0609c293e5a72c8ace9ecc4a1a4/jp_normalizer.hh#L178

syoyo added a commit to lighttransport/tinyusdz that referenced this issue Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants