-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement METH_FASTCALL for pyfunctions. #1619
Conversation
Note for reviewers: This PR only works if the |
Very cool, thank you for working on this! Yes I'm going to work on the aforementioned config crate tonight :) I see you've marked as draft; do you want me to review and comment on the code already?
Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths. It'd be really interesting to make separate benchmarks for each of the
That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole |
Yeah, definitely fine to review. I marked as draft because it cannot be merged yet, but that is also clear from the test results :)
Well, there's not too much which is really new, but I hope something can be squeezed out.
It's commonly used for "forwarding" or "wrapping" functions where any arguments are passed verbatim to another function/method - that way the signature doesn't need to track. Not sure how often those occur in C/Rust, but since it's less instead of more code I think it's worth it. You'll see I commented out a test - this is one more question, since now |
088bc57
to
9940cdc
Compare
Good question. @sebpuetz added it, with the intention that it would give users finer control over the creation of the python function object. In reality I think Personally I'm not aware it's widely used, and if it's preventing us from writing decent optimizations then I think it's fine for us to remove it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for working on this, this is great! I've made a few code suggestions and also pointed out a few spots where we can try to make some optimizations to close the performance gap.
I'm going to try to finish up the docs for #1622 on Sunday evening, so hopefully this PR will be easier to test and work with soon...
} else { | ||
impl_wrap_cfunction_with_keywords(cls, &spec, self_ty)? | ||
}; | ||
Ok(quote! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job finding this one for #[pymethods]
!
There's also #[staticmethod]
and #[classmethod]
, which I think can also support METH_FASTCALL
. With a bit of refactoring it might be possible to support them in this PR.
And then there's #[call]
, which can also support the vectorcall protocol, but that's more complicated so probably worth punting for now and remembering in #684.
*out = Some(arg); | ||
} | ||
|
||
// Collect varargs into tuple | ||
let varargs = if self.accept_varargs { | ||
Some(PyTuple::new(py, args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so this is a spot where I definitely think we can optimize PyTuple::new
for the unlimited api (or even make an internal PyTuple::new_fast
which does a lot less work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what would you optimize there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two possible inefficiencies in PyTuple::new
which we could improve:
- It uses
PyTuple_SetItem
but could usePyTuple_SET_ITEM
, which does less error checking. I think we would also have to definePyTuple_SET_ITEM
in Rust (it's a C macro), so this would inline really well. - It uses
.to_object().into_ptr()
for genericT
, if we wrote an implementation ofPyTuple::new
which took&PyAny
iterator we could just useinto_ptr()
which might make a teeeny tiny performance improvement (if the optimizer didn't already achieve the same effect).
I definitely think the first bullet could be worth trying. Could make a nice standalone PR rather than being part of this one.
Updated to latest main. The However, I could not get around the fact that the raw function's type is the concrete function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great to me, thanks for working on this!
I'm in favour of merging this as-is, the slowdown for the *args
/ **kwargs
case doesn't seem so bad esp after I rebased locally on #1653 . We can always add extra optimisations for those cases like the just-args-and-kwargs-passthrough you suggest in the OP.
Couple of suggestions for tidy-up below, and I think it also could benefit with two CHANGELOG entries:
- [Changed] use
METH_FASTCALL
to improve#[pyfunction]
performance - [Removed]
raw_pycfunction!
macro
// _nargs is the number of positional arguments in the _args array, | ||
// the number of KW args is given by the length of _kwnames | ||
let _kwnames: Option<&pyo3::types::PyTuple> = #py.from_borrowed_ptr_or_opt(_kwnames); | ||
// Safety: &PyAny has the same memory layout as `*mut ffi::PyObject` | ||
let _args = _args as *const &pyo3::PyAny; | ||
let _kwargs = if let Some(kwnames) = _kwnames { | ||
std::slice::from_raw_parts(_args.offset(_nargs), kwnames.len()) | ||
} else { | ||
&[] | ||
}; | ||
let _args = std::slice::from_raw_parts(_args, _nargs as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This boilerplate appears twice, perhaps worth refactoring into a fn fastcall_args_kwargs_boilerplate() -> TokenStream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to do this together with adding fastcall support to static
and class
methods. But this is a larger refactoring which I might not have time for soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries; I find some weeks I can do lots of contributions and other weeks I can barely respond to issues. The codebase is gradually marching in the right direction :)
This is now for all not-NOARGS functions, with the change suggested to make
extract_arguments
take iterators.Unfortunately this seems to slow down the cases where
*args
or**kwargs
are present: on my machine, the benchmarks showThis is even without the fact that with the old code, they could even be optimized: for
**kwargs
a new dict is created, while it is possible to keep the dict with all arguments given by name, and remove those args which match named parameters.And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.