Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand std::string_view support to str, bytes, memoryview #3521

Merged
merged 7 commits into from
Dec 3, 2021

Conversation

jagerman
Copy link
Member

@jagerman jagerman commented Dec 1, 2021

Description

  1. Allows constructing a str or bytes implicitly from a string_view; this is essentially a small shortcut allowing a caller to write py::bytes{sv} rather than py::bytes{sv.data(), sv.size()}.

    For py::str this also allows std::u8string_view, but not for py::bytes because that didn't seem entirely appropriate to me.

  2. Allows implicit conversion to std::string_view from py::bytes—this plugs a current hole where there's no simple way to get such a view of the bytes without copying it (or resorting to Python API calls).

    (This is not done for str because when the str contains unicode we have to allocate to a temporary and so there might not be some string data we can properly view without owning.)

  3. Allows memoryview::from_memory to accept a string_view. As with the other from_memory calls, it's entirely your responsibility to keep it alive.

This also required moving the string_view availability detection into detail/common.h because this PR needs it in pytypes.h, which is higher up the include chain than cast.h where it was being detected currently.

Suggested changelog entry:

* Make str/bytes/memoryview more interoperable with ``std::string_view``.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

include/pybind11/pytypes.h Show resolved Hide resolved
@rwgk
Copy link
Collaborator

rwgk commented Dec 1, 2021

The 2.7 ubuntu failure probably just needs a leading u (u"i'm a string", a trick we use in many places).

I've never seen the ITERATOR_DEBUG_LEVEL failure before.

include/pybind11/pytypes.h Show resolved Hide resolved
include/pybind11/pytypes.h Show resolved Hide resolved
@jagerman jagerman force-pushed the more-string-view branch 2 times, most recently from 0bf48b2 to 0c7d446 Compare December 2, 2021 13:35
@jagerman
Copy link
Member Author

jagerman commented Dec 2, 2021

Squashed it.

@Skylion007
Copy link
Collaborator

@rwgk Does this pass the Google Test suite?

@jagerman
Copy link
Member Author

jagerman commented Dec 2, 2021

I don't understand why the 3.6 - windows-latest build here started failing. Is that a known flakey build or something?

@rwgk
Copy link
Collaborator

rwgk commented Dec 2, 2021

I don't understand why the 3.6 - windows-latest build here started failing. Is that a known flakey build or something?

Yes, that's our most common known flake, safe to ignore. It's a lot better than it used to be before PR #2995, but it's still not stable.

There are other common flakes: 1. print from destructor; 2. various transient issues installing or downloading dependencies.

@rwgk
Copy link
Collaborator

rwgk commented Dec 2, 2021

@rwgk Does this pass the Google Test suite?

I'll initiate this now (will take several hours).

Copy link
Collaborator

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy when Google's happy. :)

@rwgk
Copy link
Collaborator

rwgk commented Dec 2, 2021

Happy when Google's happy. :)

I missed the 12:00 global testing opportunity, but got this PR into the 16:00 batch. Results are expected about 4-5 hours later.

I did comprehensive manual testing before sending this PR for global testing, including testing with ASAN. No issues.

@rwgk
Copy link
Collaborator

rwgk commented Dec 3, 2021

This PR didn't make it through the basic "smoke check" :-(
Below is one breakage. I still have to try to understand myself.
IIRC we recently had some trouble with tensorflow::tstring already, I forget what exactly ...

EDIT: What I had in mind was a PyCLIF issue, not pybind11: google/clif@ef5fa11
(Not sure what we can learn from it. Just mentioning for completeness.)

EDIT: This is the tensorflow::tstring code that causes the issue: https://github.com/tensorflow/tensorflow/blob/289850292c2b9761055415b91c2a6c0f924780f0/tensorflow/core/platform/tstring.h#L142

Error message:

third_party/tensorflow/python/lib/io/file_io_wrapper.cc:316:21: error: ambiguous conversion for functional-style cast from 'tensorflow::tstring' to 'py::bytes'
             return py::bytes(result);
                    ^~~~~~~~~~~~~~~~
./third_party/pybind11/include/pybind11/detail/../pytypes.h:1174:5: note: candidate constructor
    bytes(const std::string &s) : bytes(s.data(), s.size()) { }
    ^
./third_party/pybind11/include/pybind11/detail/../pytypes.h:1189:5: note: candidate constructor
    bytes(std::string_view s) : bytes(s.data(), s.size()) { }
    ^
1 error generated.

While compiling:

      .def("read",
           [](BufferedInputStream* self, int64_t bytes_to_read) {
             py::gil_scoped_release release;
             tensorflow::tstring result;
             const auto status = self->ReadNBytes(bytes_to_read, &result);
             if (!status.ok() && !tensorflow::errors::IsOutOfRange(status)) {
               result.clear();
               tensorflow::MaybeRaiseRegisteredFromStatusWithGIL(status);
             }
             py::gil_scoped_acquire acquire;
             return py::bytes(result);
           })

jagerman and others added 3 commits December 2, 2021 17:19
1. Allows constructing a str or bytes implicitly from a string_view;
   this is essentially a small shortcut allowing a caller to write
   `py::bytes{sv}` rather than `py::bytes{sv.data(), sv.size()}`.

2. Allows implicit conversion *to* string_view from py::bytes -- this
   saves a fair bit more as currently there is no simple way to get such
   a view of the bytes without copying it (or resorting to Python API
   calls).

   (This is not done for `str` because when the str contains unicode we
   have to allocate to a temporary and so there might not be some string
   data we can properly view without owning.)

3. Allows `memoryview::from_memory` to accept a string_view.  As with
   the other from_memory calls, it's entirely your responsibility to
   keep it alive.

This also required moving the string_view availability detection into
detail/common.h because this PR needs it in pytypes.h, which is higher
up the include chain than cast.h where it was being detected currently.
This change is known to fix the `tensorflow::tstring` issue reported under pybind#3521 (comment)

TODO: Minimal reproducer for the `tensorflow::tstring` issue.
@rwgk
Copy link
Collaborator

rwgk commented Dec 3, 2021

Trying a fix. I don't know if this is the best approach, but at least it fixes the failures in the wild (at least some; I still have to try the global testing again).

The force push was needed because I also rebased on master.

Ralf W. Grosse-Kunstleve added 3 commits December 2, 2021 17:28
Error without the enable_if trick:

```
/usr/local/google/home/rwgk/forked/pybind11/tests/test_builtin_casters.cpp:169:16: error: ambiguous conversion for functional-style cast from 'TypeWithBothOperatorStringAndStringView' to 'py::bytes'
        return py::bytes(TypeWithBothOperatorStringAndStringView());
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:1174:5: note: candidate constructor
    bytes(const std::string &s) : bytes(s.data(), s.size()) { }
    ^
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:1191:5: note: candidate constructor
    bytes(std::string_view s) : bytes(s.data(), s.size()) { }
    ^
```
@rwgk
Copy link
Collaborator

rwgk commented Dec 3, 2021

I think this will work now, pending full global testing; just initiated. Results when I wake up (hopefully).

I already verified that all failing targets from the previous sample run pass with the latest version of this PR.

@jagerman
Copy link
Member Author

jagerman commented Dec 3, 2021

I think this will work now, pending full global testing; just initiated. Results when I wake up (hopefully).

str() will have the same issue; I've pushed an update to apply the workaround there, too.

@jagerman
Copy link
Member Author

jagerman commented Dec 3, 2021

I'd kind of prefer that, in the case of ambiguous conversions, we were preferring string_view to string (because it likely avoids an extra allocation), but I don't really see a nice way to do that without introducing potential breakage for existing code.

@rwgk
Copy link
Collaborator

rwgk commented Dec 3, 2021

I'd kind of prefer that, in the case of ambiguous conversions, we were preferring string_view to string (because it likely avoids an extra allocation), but I don't really see a nice way to do that without introducing potential breakage for existing code.

Initially I tried #ifdefing out the std::string constructor, which also fixed the tensorflow::tstring issue in the wild, but broke existing unit tests. With a pushing-and-shoving mentality I tried to rescue it, but quickly got into a mess and gave up.

@rwgk
Copy link
Collaborator

rwgk commented Dec 3, 2021

The one CI failure is a flake (Run jwlawson/actions-setup-cmake@v1.11 Error: connect ETIMEDOUT 140.82.114.6:443).

Google-global testing was successful (it did not run into the str issue fixed here by 5709ccf in the meantime).

Definitely good to merge Jason!

@jagerman jagerman merged commit b4939fc into pybind:master Dec 3, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Dec 3, 2021
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants