You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks a lot for providing this project! It makes it so much easier to work with UTF-8 data.
I'm aware that this might be out of the scope of this project, so I figured I'd just ask what you all think about this. When porting my code from std::string to tiny_utf8::string I encountered various issues, where the mismatch of size and length caused issues.
E.g., my code uses templates with std::size(...) to work on arbitrary data types. It doesn't work on tiny_utf8::strings though, since size() is the raw byte size, but operator[] expects a codepoint index. It would be nice of tiny_utf8 would be consistent with other STL containers.
Various other functions also made use of both size() and length(). Yes, it can also be fixed on my side, but (1) its difficult to get this done correctly in a large code base, and (2) it does no longer work as a "quick drop-in replacement" as advertised in the README.
So, what I'm considering is adding a new raw_size() (similar to raw_at, raw iterators, ...) that returns the byte size, and change the default behavior of size to match the length. This is obviously not a backwards compatible change, but (1) there have also been other non-backwards compatible changes and (2) there could still be a define-parameter to switch between both behaviors.
What do you think? If its out of the scope I'll come up with a different solution. :)
The text was updated successfully, but these errors were encountered:
First of all, thanks a lot for providing this project! It makes it so much easier to work with UTF-8 data.
I'm aware that this might be out of the scope of this project, so I figured I'd just ask what you all think about this. When porting my code from
std::string
totiny_utf8::string
I encountered various issues, where the mismatch of size and length caused issues.E.g., my code uses templates with
std::size(...)
to work on arbitrary data types. It doesn't work ontiny_utf8::string
s though, sincesize()
is the raw byte size, butoperator[]
expects a codepoint index. It would be nice of tiny_utf8 would be consistent with other STL containers.Various other functions also made use of both
size()
andlength()
. Yes, it can also be fixed on my side, but (1) its difficult to get this done correctly in a large code base, and (2) it does no longer work as a "quick drop-in replacement" as advertised in the README.So, what I'm considering is adding a new
raw_size()
(similar toraw_at
, raw iterators, ...) that returns the byte size, and change the default behavior ofsize
to match thelength
. This is obviously not a backwards compatible change, but (1) there have also been other non-backwards compatible changes and (2) there could still be a define-parameter to switch between both behaviors.What do you think? If its out of the scope I'll come up with a different solution. :)
The text was updated successfully, but these errors were encountered: