Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add class to facilitate serialization and validation of code points #2858

Merged
merged 2 commits into from
Jan 6, 2024

Conversation

nvmkuruc
Copy link
Collaborator

@nvmkuruc nvmkuruc commented Dec 11, 2023

Description of Change(s)

Depends on (#2855)

#2788 was added to decode UTF-8 strings into code points. This change repurposes the encoding logic from #2120 into a wrapper around a uint32_t. This wrapper validates on construction and only stores valid UTF-8 code points (ie. surrogates are excluded). Invalid code points are replaced (silently) with the ReplacementValue. A stream operator for serialization is provided.

The TF_INVALID_CODE_POINT uint32_t has been replaced with a TfUtf8InvalidCodePoint constexpr instance.

To aid testing, EndAsIterator() has been added to the TfUtf8CodePointView class which returns an iterator of the same type as begin(). While the language (via range based for loops) supports sentinel end() types as of C++17, many STL algorithms don't until C++20. While EndAsIterator() this creates redundant copies of the string_view's end() value, in non-performance critical code, it may be preferable to enable usage of the STL.

Fixes Issue(s)

  • I have verified that all unit tests pass with the proposed changes
  • I have submitted a signed Contributor License Agreement

@nvmkuruc nvmkuruc changed the title Add class to facilitate serialization and validation of code points Add class to facilitate serialization and validation of code points Dec 11, 2023
@jesschimein
Copy link
Contributor

Filed as internal issue #USD-9069

TF_API std::ostream& operator<<(std::ostream&, const TfUtf8CodePoint);

/// The replacement code point can be used to signal that a code point could
/// not be decoded and needed to be replaced.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal, but I think it would be a little cleaner if this was just
constexpr TfUtf8CodePoint TfUtf8InvalidCodePoint = TfUtf8CodePoint();
since the default constructed TfUtf8CodePoint is the invalid code point getting rid of another hardcoded 0xFFFD

@nvmkuruc nvmkuruc force-pushed the codepointclass branch 2 times, most recently from 9a337ca to 719933d Compare December 18, 2023 23:27
@pixar-oss pixar-oss merged commit cd8b2d5 into PixarAnimationStudios:dev Jan 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants