Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement METH_FASTCALL for pyfunctions. #1619

Merged
merged 1 commit into from
Jun 5, 2021
Merged

Conversation

birkenfeld
Copy link
Member

This is now for all not-NOARGS functions, with the change suggested to make extract_arguments take iterators.

Unfortunately this seems to slow down the cases where *args or **kwargs are present: on my machine, the benchmarks show

       before  after
simple    844    786
mixed    1043   1168
arg_kw    901   1019

This is even without the fact that with the old code, they could even be optimized: for **kwargs a new dict is created, while it is possible to keep the dict with all arguments given by name, and remove those args which match named parameters.

And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.

@birkenfeld
Copy link
Member Author

Note for reviewers: This PR only works if the pyo3-macros-backend crate has the cfg defines available, e.g. by copying build.rs from the main crate to pyo3-macros-backend. @davidhewitt promised to resolve this by adding another crate providing the build config.

@davidhewitt
Copy link
Member

Very cool, thank you for working on this! Yes I'm going to work on the aforementioned config crate tonight :)

I see you've marked as draft; do you want me to review and comment on the code already?

Unfortunately this seems to slow down the cases where *args or **kwargs are present: on my machine, the benchmarks show

Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths.

It'd be really interesting to make separate benchmarks for each of the *args and **kwargs cases so that we can figure out which type of variable parameter is causing the slowdown (could be both, I guess). Maybe there's optimizations we can apply in the not-limited-api case which reclaim the lost speed. (e.g. I can think of possible optimizations to tuple iteration and PyTuple::new)

And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.

That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole extract_arguments function. Though I don't know how often users use this in practice?

@birkenfeld birkenfeld marked this pull request as ready for review May 19, 2021 17:38
@birkenfeld
Copy link
Member Author

Yeah, definitely fine to review. I marked as draft because it cannot be merged yet, but that is also clear from the test results :)

Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths

Well, there's not too much which is really new, but I hope something can be squeezed out.

That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole extract_arguments function. Though I don't know how often users use this in practice?

It's commonly used for "forwarding" or "wrapping" functions where any arguments are passed verbatim to another function/method - that way the signature doesn't need to track.

Not sure how often those occur in C/Rust, but since it's less instead of more code I think it's worth it.

You'll see I commented out a test - this is one more question, since now raw_pycfunction can return different types, so the PyCFunction call needs to be different depending on the build config (due to this patch) and the function's signature (due to METH_NOARGS). Do we need raw_pycfunction?

@birkenfeld birkenfeld force-pushed the fastcall branch 2 times, most recently from 088bc57 to 9940cdc Compare May 20, 2021 05:12
@davidhewitt
Copy link
Member

Do we need raw_pycfunction?

Good question. @sebpuetz added it, with the intention that it would give users finer control over the creation of the python function object. In reality I think wrap_pyfunction! can be used to much the same effect.

Personally I'm not aware it's widely used, and if it's preventing us from writing decent optimizations then I think it's fine for us to remove it.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for working on this, this is great! I've made a few code suggestions and also pointed out a few spots where we can try to make some optimizations to close the performance gap.

I'm going to try to finish up the docs for #1622 on Sunday evening, so hopefully this PR will be easier to test and work with soon...

pyo3-macros-backend/src/pyfunction.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/method.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyfunction.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyfunction.rs Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyfunction.rs Outdated Show resolved Hide resolved
} else {
impl_wrap_cfunction_with_keywords(cls, &spec, self_ty)?
};
Ok(quote! {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job finding this one for #[pymethods]!

There's also #[staticmethod] and #[classmethod], which I think can also support METH_FASTCALL. With a bit of refactoring it might be possible to support them in this PR.

And then there's #[call], which can also support the vectorcall protocol, but that's more complicated so probably worth punting for now and remembering in #684.

src/class/methods.rs Outdated Show resolved Hide resolved
src/types/tuple.rs Show resolved Hide resolved
src/derive_utils.rs Show resolved Hide resolved
*out = Some(arg);
}

// Collect varargs into tuple
let varargs = if self.accept_varargs {
Some(PyTuple::new(py, args))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so this is a spot where I definitely think we can optimize PyTuple::new for the unlimited api (or even make an internal PyTuple::new_fast which does a lot less work).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what would you optimize there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two possible inefficiencies in PyTuple::new which we could improve:

  • It uses PyTuple_SetItem but could use PyTuple_SET_ITEM, which does less error checking. I think we would also have to define PyTuple_SET_ITEM in Rust (it's a C macro), so this would inline really well.
  • It uses .to_object().into_ptr() for generic T, if we wrote an implementation of PyTuple::new which took &PyAny iterator we could just use into_ptr() which might make a teeeny tiny performance improvement (if the optimizer didn't already achieve the same effect).

I definitely think the first bullet could be worth trying. Could make a nice standalone PR rather than being part of this one.

@birkenfeld
Copy link
Member Author

Updated to latest main.

The raw_cfunction tests are still disabled. I tried making a trait implemented by all the PyCFunction... signatures, so we could have a PyCFunction::new taking that trait as an argument. In that way, the code would work regardless of what optimized variant is chosen by the proc macro.

However, I could not get around the fact that the raw function's type is the concrete function fn(x) -> y {FUNCTION} and not the function pointer type fn(x) -> y -- I could not find a way to cast away the concrete function type "generically" (without having to know the chosen variant).

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great to me, thanks for working on this!

I'm in favour of merging this as-is, the slowdown for the *args / **kwargs case doesn't seem so bad esp after I rebased locally on #1653 . We can always add extra optimisations for those cases like the just-args-and-kwargs-passthrough you suggest in the OP.

Couple of suggestions for tidy-up below, and I think it also could benefit with two CHANGELOG entries:

  • [Changed] use METH_FASTCALL to improve #[pyfunction] performance
  • [Removed] raw_pycfunction! macro

Comment on lines +487 to +497
// _nargs is the number of positional arguments in the _args array,
// the number of KW args is given by the length of _kwnames
let _kwnames: Option<&pyo3::types::PyTuple> = #py.from_borrowed_ptr_or_opt(_kwnames);
// Safety: &PyAny has the same memory layout as `*mut ffi::PyObject`
let _args = _args as *const &pyo3::PyAny;
let _kwargs = if let Some(kwnames) = _kwnames {
std::slice::from_raw_parts(_args.offset(_nargs), kwnames.len())
} else {
&[]
};
let _args = std::slice::from_raw_parts(_args, _nargs as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boilerplate appears twice, perhaps worth refactoring into a fn fastcall_args_kwargs_boilerplate() -> TokenStream ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to do this together with adding fastcall support to static and class methods. But this is a larger refactoring which I might not have time for soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries; I find some weeks I can do lots of contributions and other weeks I can barely respond to issues. The codebase is gradually marching in the right direction :)

tests/test_pyfunction.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants