You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tf.strings.length() on the other hand, returns a tf.int32 tensor which counts lengths in either bytes or UTF8 characters according to the value of the parameter unit.
So this would actually be two separate requests:
Change the return types of text.regex_split_with_offsets() to tf.int32, removing the need for a cast when comparing with tf.strings.length(). I doubt there will be a use case for strings longer than INT32_MAX in the foreseeable future.
Add parameter unit: Literal["BYTE", "UTF8_CHAR"] = "BYTE" matching the behavior of tf.strings.length() and tf.strings.substr(). Seeing the regular expressions are already being interpreted in 'utf-8', I think it would make sense to add a layer of abstraction to facilitate slicing by UTF-8 character index.
The text was updated successfully, but these errors were encountered:
text.regex_split_with_offsets()
currently returnsbegin
andend
astf.int64
tensors that count indices in bytes.tf.strings.length()
on the other hand, returns atf.int32
tensor which counts lengths in either bytes or UTF8 characters according to the value of the parameterunit
.So this would actually be two separate requests:
text.regex_split_with_offsets()
totf.int32
, removing the need for a cast when comparing withtf.strings.length()
. I doubt there will be a use case for strings longer than INT32_MAX in the foreseeable future.unit: Literal["BYTE", "UTF8_CHAR"] = "BYTE"
matching the behavior oftf.strings.length()
andtf.strings.substr()
. Seeing the regular expressions are already being interpreted in 'utf-8', I think it would make sense to add a layer of abstraction to facilitate slicing by UTF-8 character index.The text was updated successfully, but these errors were encountered: