Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try harder by looking for a __bool__ magic method when extracing bool values from Python objects. #3638

Merged
merged 4 commits into from
Dec 19, 2023

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Dec 10, 2023

I decided to not implement the full protocol for truth value testing as it seems confusing in the context of function arguments if basically any instance of custom classes or non-empty collections turns into true.

Closes #3637

@adamreichold adamreichold force-pushed the call-op-bool branch 2 times, most recently from f02d0d1 to 9e348a9 Compare December 10, 2023 16:02
Copy link

codspeed-hq bot commented Dec 10, 2023

CodSpeed Performance Report

Merging #3638 will not alter performance

Comparing call-op-bool (4177dfc) with main (87e42c9)

Summary

✅ 78 untouched benchmarks

@birkenfeld
Copy link
Member

I don't think it is good to single out objects that define __bool__. It's one of several equally valid ways to define truth conversion, on the Python side it absolutely doesn't matter which one you use. For example, when implementing a collection class I might give it an explicit __bool__ or not, depending on if I can be bothered to look up how the behavior is otherwise.

Contrary to something like __int__, __bool__ isn't reserved for object that "feel like a bool" (e.g. numpy bools).

I.e. my recommendation is, either keep being strict here, or just use PyAny::is_true().

@adamreichold
Copy link
Member Author

adamreichold commented Dec 10, 2023

While I can see the argument for being consistent in one way or the other, i.e. either strict insofar a function argument is sort of type specification for the Python API or loose where we try as hard as possible to convert any incoming value into a the target type, I think that we cannot solve this issue as long as the single FromPyObject trait is doing double duty for both approaches.

There is also precedent for this middle ground approach in other implementations, e.g. Vec accepts only implementors of the Sequence ABC even though the implementation to convert the Python iterable into a Vec does not require it at all.

So while I would prefer a principled approach, the practicalities of ergonomic usage of PyO3 based on a single trait prevent this for now as far I can see.

@birkenfeld
Copy link
Member

I can see that, but my argument is that the approach in this PR is worse than either. It happens to work for numpy bools, fixing the original issue, but also allows conversion for a whole host of other types, none of which have to be "bool-like", and arbitrarily excludes others which are no less "bool-like".

@adamreichold
Copy link
Member Author

but also allows conversion for a whole host of other types, none of which have to be "bool-like", and arbitrarily excludes others which are no less "bool-like".

But does this have any concrete usability consequences or is it merely an uncomfortably inconsistent design?

@adamreichold
Copy link
Member Author

adamreichold commented Dec 10, 2023

Maybe as a data point that this can work, pybind11 seems have run into the same problem, i.e. pybind/pybind11#925 1 and solved it in similar manner albeit optimized to avoid the cost of lookup_special (which could do here as well).

Admittedly, they are in the advantageous position of having the pyarg().noconvert() modifier so that the author can explicitly request stricter or looser conversions. But then again, they even special-cased the loose behaviour to always apply to numpy.bool_ (which is also something we could do here).

Footnotes

  1. I was not aware of that PR went starting to implement this one.

@adamreichold
Copy link
Member Author

But they certainly also had a long discussion before settling on the final approach. 😅

@davidhewitt
Copy link
Member

This reminds me a lot of python/cpython#15609 - in Python 3.12 they changed all C functions to accept any object and check using equivalent of PyAny::is_true.

Slight aside, what do you think of renaming PyAny::is_true to PyAny::is_truthy? I keep intuitively thinking x.is_true() would map to the python x is True :)

@adamreichold
Copy link
Member Author

Slight aside, what do you think of renaming PyAny::is_true to PyAny::is_truthy? I keep intuitively thinking x.is_true() would map to the python x is True :)

This I like.

This reminds me a lot of python/cpython#15609 - in Python 3.12 they changed all C functions to accept any object and check using equivalent of PyAny::is_true.

This, not so much.

@davidhewitt
Copy link
Member

👍 I will open an is_truthy PR hopefully later in the week.

Regarding the Python 3.12 change, I agree it's not my favourite behaviour either (and PyO3 being principled about the difference between 1 and True helped me isolate a bug in the past). Despite that I sort of think if we're making this conversion more lax it might be best if we match CPython behaviour to minimise surprises to users. But I'll try to review this PR in detail as soon as I get a moment.

@davidhewitt
Copy link
Member

(Sorry I was a little slow on review here, ended up getting excited by the Py2 work...)

Hmmm, it's hard to make a judgement on the right solution for this. I am definitely in favour of us trying to fix the numpy case.

Overall, we've definitely been trending slowly towards being more lax on these Python -> Rust conversions, e.g. #3374, #3197, as this seems to be what users want.

they are in the advantageous position of having the pyarg().noconvert() modifier

Yes, I definitely think there is room for something like this; I called it "strict" mode in #3226 (comment), we could have #[pyo3(noconvert)] as an annotation on function arguments, for example.


Overall, I think that pybind11 prior art is a reasonable one to follow here, and I like consistency in the ecosystem (if we think the CPython choice is too lax). I also think it's less breaking for future users if we chose to go more lax again to match CPython than it would be to get stricter.

I definitely think that where we're at is that FromPyObject is at the moment a lax conversion. We allowed str to ip address in #3197, after all.

So perhaps the road ahead is to proceed with the implementation of this as is currently written in this PR, and then if users want to opt-out, we should add a #[pyo3(noconvert)] attribute? I think that could be supported by a separate trait, so we can implement it on-demand starting with bool. The hardest thing might be naming that trait :)

All this said, would let x: bool = any.extract()?; being roughly equivalent to Python bool(any) be a good thing? My Rust brain says no but my Python brain says yes 😖

I think I need to sleep on this and see how I feel in the morning...

src/types/boolobject.rs Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member

davidhewitt commented Dec 16, 2023

With fresh eyes I've come around to this solution, I think the smaller behaviour change is preferable.

If users want to have the stricter method again I suggest we explore the #[pyo3(noconvert)] option.

@adamreichold
Copy link
Member Author

@birkenfeld I would be quite unhappy to merge this over you direct objection. I wonder if you can be placated by the suggestion of adding a #[pyo3(noconvert)] attribute to put these decisions into our user's hands eventually? Also maybe the pybind11 precedent suggests that this is tenable? I think we all agree that this is messy and inconsistent and from that perspective it can be said that "the approach in this PR is worse than either", but I also think that this is an remedy for an actual pain point felt by our users working in a messy and inconsistent ecosystem.

@birkenfeld
Copy link
Member

I'm all in favor of the noconvert annotation. However, it doesn't touch my core objection: that considering __bool__ but not __len__ is simply inconsistent. The availability of a way to opt out of broad boolean conversion makes this inconsistency even more glaring since there is no justification for limiting the scope in search of a compromise.

Just think about the user's perspective. If I can pass in None and objects like winreg.HKEYs or gdbm.gdbms, but not others that Python considers bool-convertible just fine in other circumstances, does that really make sense?

@adamreichold
Copy link
Member Author

adamreichold commented Dec 16, 2023

The availability of a way to opt out of broad boolean conversion makes this inconsistency even more glaring since there is no justification for limiting the scope in search of a compromise.

Yes, if we had noconvert today, using the full truth value testing procedure might be a better choice for the "broad/loose" setting. But we do not have that and hence the compromise.

Just think about the user's perspective. If I can pass in None and objects like winreg.HKEYs or gdbm.gdbms, but not others that Python considers bool-convertible just fine in other circumstances, does that really make sense?

I think there is a difference between "bool-convertible" and "truthy". The truth value testing which includes __len__ is about what is "truthy" and does not say anything about being "bool-convertible" which I would argue is something that duck-typed Python does not care about at all, but statically notionally typed Rust and C++ do. And while I think considering a non-empty collection or a not-None class instance "truthy" for an if statement works, I don't think it works for a boolean function argument because it just looses too much information.

@davidhewitt
Copy link
Member

davidhewitt commented Dec 16, 2023

I think there is a difference between "bool-convertible" and "truthy". The truth value testing which includes len is about what is "truthy" and does not say anything about being "bool-convertible" which I would argue is something that duck-typed Python does not care about at all, but statically notionally typed Rust and C++ do.

This is my feeling too when I was reflecting on let x: bool = any.extract()?; above, I think our intention there has always been for that to be bool conversion and not truth testing.

@birkenfeld
Copy link
Member

My point is that __bool__ is not a method for "bool conversion". It is specifically for defining truthiness, as done by bool(x) or if x.

Just check out https://github.com/search?type=code&auto_enroll=true&q=%22def+__bool__%22 - many of the objects defining __bool__ are far from "objects similar to a bool that should be convertible to it". (Just like my example of HKEY and gdbm, which are from the stdlib.)

@davidhewitt
Copy link
Member

I see what you mean, that "bool conversion" isn't separable from truth testing on the Python side.

Given the strength of @birkenfeld's conviction, maybe we should slow down for now and just take the route pybind11 did for their strict pathway, i.e. special case for numpy bools?

@adamreichold
Copy link
Member Author

Given the strength of @birkenfeld's conviction, maybe we should slow down for now and just take the route pybind11 did for their strict pathway, i.e. special case for numpy bools?

Yes, if only to resolve the issue at hand. I do see some types in the above list would be good candidates as well, like pytorch's BoxedBool, but I guess it is better to solve some issues than none at all. Will push the change as a separate commit...

@adamreichold adamreichold force-pushed the call-op-bool branch 4 times, most recently from 0937f7c to e0a2c6f Compare December 17, 2023 14:30
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, save for the CHANGELOG entry. I think if we want to go further here the next steps will be to figure out how the #[pyo3(noconvert)] annotation might work.

newsfragments/3638.changed.md Outdated Show resolved Hide resolved
@adamreichold adamreichold added this pull request to the merge queue Dec 18, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 18, 2023
@adamreichold
Copy link
Member Author

Uff, it seems __qualname__ on PyPy is bool_ instead of numpy.bool_? Do would just skip the test on PyPy or loosen the name check to numpy.bool_ or bool_? Should I use something other than get_type().name()?

@davidhewitt
Copy link
Member

I wonder if this changes based on version, e.g. maybe 3.9 or 3.10 change?

It could be worth adding PyPy 3.10 on Ubuntu to the pr ci to get a hint what might work before the full merge queue runs?

@adamreichold
Copy link
Member Author

I tested this locally using pyenv and it does seem independent of the PyPy version, i.e. the behaviour is the same for 3.8, 3.9 and 3.10 and name() always yields "bool_" without the module.

@adamreichold
Copy link
Member Author

But reading https://peps.python.org/pep-3155/ and testing this in the CPython REPL, __qualname__ is not intended to include the module name. I wonder if it is really a full replacement for tp_name as used by us?

@adamreichold
Copy link
Member Author

From reading https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_name, I think we should use __module__.__name__ instead of __qualname__ as the abi3 fallback?

@adamreichold
Copy link
Member Author

adamreichold commented Dec 18, 2023

So, I think ideally, this would be rebased on #3660 and made use of PyType::full_name if that one is accepted.

… values from Python objects.

I decided to not implement the full protocol for truth value testing [1] as it
seems confusing in the context of function arguments if basically any instance
of custom class or non-empty collection turns into `true`.

[1] https://docs.python.org/3/library/stdtypes.html#truth
@adamreichold
Copy link
Member Author

Rebased this onto main now that the full name is available on PyType and used this to limit the scope of the looser conversion.

@adamreichold adamreichold added this pull request to the merge queue Dec 19, 2023
Merged via the queue into main with commit 8bb6437 Dec 19, 2023
36 of 37 checks passed
@adamreichold adamreichold deleted the call-op-bool branch December 19, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error on extracting numpy boolean
3 participants