Implement METH_FASTCALL for pyfunctions. #1619

birkenfeld · 2021-05-19T15:22:42Z

This is now for all not-NOARGS functions, with the change suggested to make extract_arguments take iterators.

Unfortunately this seems to slow down the cases where *args or **kwargs are present: on my machine, the benchmarks show

       before  after
simple    844    786
mixed    1043   1168
arg_kw    901   1019

This is even without the fact that with the old code, they could even be optimized: for **kwargs a new dict is created, while it is possible to keep the dict with all arguments given by name, and remove those args which match named parameters.

And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.

birkenfeld · 2021-05-19T15:24:08Z

Note for reviewers: This PR only works if the pyo3-macros-backend crate has the cfg defines available, e.g. by copying build.rs from the main crate to pyo3-macros-backend. @davidhewitt promised to resolve this by adding another crate providing the build config.

davidhewitt · 2021-05-19T17:24:40Z

Very cool, thank you for working on this! Yes I'm going to work on the aforementioned config crate tonight :)

I see you've marked as draft; do you want me to review and comment on the code already?

Unfortunately this seems to slow down the cases where *args or **kwargs are present: on my machine, the benchmarks show

Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths.

It'd be really interesting to make separate benchmarks for each of the *args and **kwargs cases so that we can figure out which type of variable parameter is causing the slowdown (could be both, I guess). Maybe there's optimizations we can apply in the not-limited-api case which reclaim the lost speed. (e.g. I can think of possible optimizations to tuple iteration and PyTuple::new)

And the "only *args and **kwds" case can be heavily optimized by just passing through the tuple and dict already gotten from METH_VARARGS.

That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole extract_arguments function. Though I don't know how often users use this in practice?

birkenfeld · 2021-05-19T17:47:18Z

Yeah, definitely fine to review. I marked as draft because it cannot be merged yet, but that is also clear from the test results :)

Hmm that's interesting but I guess not super suprising as we haven't done any optimization of the brand new code paths

Well, there's not too much which is really new, but I hope something can be squeezed out.

That's a really interesting insight. Definitely could be worth doing; we could skip pretty much the whole extract_arguments function. Though I don't know how often users use this in practice?

It's commonly used for "forwarding" or "wrapping" functions where any arguments are passed verbatim to another function/method - that way the signature doesn't need to track.

Not sure how often those occur in C/Rust, but since it's less instead of more code I think it's worth it.

You'll see I commented out a test - this is one more question, since now raw_pycfunction can return different types, so the PyCFunction call needs to be different depending on the build config (due to this patch) and the function's signature (due to METH_NOARGS). Do we need raw_pycfunction?

davidhewitt · 2021-05-20T07:59:21Z

Do we need raw_pycfunction?

Good question. @sebpuetz added it, with the intention that it would give users finer control over the creation of the python function object. In reality I think wrap_pyfunction! can be used to much the same effect.

Personally I'm not aware it's widely used, and if it's preventing us from writing decent optimizations then I think it's fine for us to remove it.

davidhewitt

Thanks very much for working on this, this is great! I've made a few code suggestions and also pointed out a few spots where we can try to make some optimizations to close the performance gap.

I'm going to try to finish up the docs for #1622 on Sunday evening, so hopefully this PR will be easier to test and work with soon...

pyo3-macros-backend/src/pyfunction.rs

pyo3-macros-backend/src/method.rs

pyo3-macros-backend/src/pyfunction.rs

davidhewitt · 2021-05-21T22:44:10Z

pyo3-macros-backend/src/pymethod.rs

+    } else {
+        impl_wrap_cfunction_with_keywords(cls, &spec, self_ty)?
+    };
+    Ok(quote! {


Nice job finding this one for #[pymethods]!

There's also #[staticmethod] and #[classmethod], which I think can also support METH_FASTCALL. With a bit of refactoring it might be possible to support them in this PR.

And then there's #[call], which can also support the vectorcall protocol, but that's more complicated so probably worth punting for now and remembering in #684.

src/class/methods.rs

src/types/tuple.rs

src/derive_utils.rs

davidhewitt · 2021-05-21T22:59:19Z

src/derive_utils.rs

            *out = Some(arg);
        }

+        // Collect varargs into tuple
+        let varargs = if self.accept_varargs {
+            Some(PyTuple::new(py, args))


Yeah so this is a spot where I definitely think we can optimize PyTuple::new for the unlimited api (or even make an internal PyTuple::new_fast which does a lot less work).

Hm, what would you optimize there?

I see two possible inefficiencies in PyTuple::new which we could improve:

It uses PyTuple_SetItem but could use PyTuple_SET_ITEM, which does less error checking. I think we would also have to define PyTuple_SET_ITEM in Rust (it's a C macro), so this would inline really well.

It uses .to_object().into_ptr() for generic T, if we wrote an implementation of PyTuple::new which took &PyAny iterator we could just use into_ptr() which might make a teeeny tiny performance improvement (if the optimizer didn't already achieve the same effect).

I definitely think the first bullet could be worth trying. Could make a nice standalone PR rather than being part of this one.

birkenfeld · 2021-05-29T13:53:17Z

Updated to latest main.

The raw_cfunction tests are still disabled. I tried making a trait implemented by all the PyCFunction... signatures, so we could have a PyCFunction::new taking that trait as an argument. In that way, the code would work regardless of what optimized variant is chosen by the proc macro.

However, I could not get around the fact that the raw function's type is the concrete function fn(x) -> y {FUNCTION} and not the function pointer type fn(x) -> y -- I could not find a way to cast away the concrete function type "generically" (without having to know the chosen variant).

davidhewitt

This is looking great to me, thanks for working on this!

I'm in favour of merging this as-is, the slowdown for the *args / **kwargs case doesn't seem so bad esp after I rebased locally on #1653 . We can always add extra optimisations for those cases like the just-args-and-kwargs-passthrough you suggest in the OP.

Couple of suggestions for tidy-up below, and I think it also could benefit with two CHANGELOG entries:

[Changed] use METH_FASTCALL to improve #[pyfunction] performance
[Removed] raw_pycfunction! macro

davidhewitt · 2021-06-05T07:33:00Z

pyo3-macros-backend/src/pyfunction.rs

+                    // _nargs is the number of positional arguments in the _args array,
+                    // the number of KW args is given by the length of _kwnames
+                    let _kwnames: Option<&pyo3::types::PyTuple> = #py.from_borrowed_ptr_or_opt(_kwnames);
+                    // Safety: &PyAny has the same memory layout as `*mut ffi::PyObject`
+                    let _args = _args as *const &pyo3::PyAny;
+                    let _kwargs = if let Some(kwnames) = _kwnames {
+                        std::slice::from_raw_parts(_args.offset(_nargs), kwnames.len())
+                    } else {
+                        &[]
+                    };
+                    let _args = std::slice::from_raw_parts(_args, _nargs as usize);


This boilerplate appears twice, perhaps worth refactoring into a fn fastcall_args_kwargs_boilerplate() -> TokenStream ?

Yeah, I wanted to do this together with adding fastcall support to static and class methods. But this is a larger refactoring which I might not have time for soon.

No worries; I find some weeks I can do lots of contributions and other weeks I can barely respond to issues. The codebase is gradually marching in the right direction :)

tests/test_pyfunction.rs

birkenfeld force-pushed the fastcall branch from 5c0ec62 to 325610e Compare May 19, 2021 15:24

birkenfeld marked this pull request as ready for review May 19, 2021 17:38

birkenfeld force-pushed the fastcall branch 2 times, most recently from 088bc57 to 9940cdc Compare May 20, 2021 05:12

davidhewitt reviewed May 21, 2021

View reviewed changes

birkenfeld force-pushed the fastcall branch from 9940cdc to 0b6b0d4 Compare May 29, 2021 13:49

davidhewitt mentioned this pull request Jun 1, 2021

pyo3-build-config: fix cross compilation #1648

Merged

davidhewitt approved these changes Jun 5, 2021

View reviewed changes

Implement METH_FASTCALL for pyfunctions and pymethods.

3e8d003

birkenfeld force-pushed the fastcall branch from 0b6b0d4 to 3e8d003 Compare June 5, 2021 10:58

davidhewitt merged commit a5810ea into PyO3:main Jun 5, 2021

birkenfeld deleted the fastcall branch August 1, 2021 07:06

davidhewitt mentioned this pull request Sep 18, 2021

PyO3 performance analysis: function overheads #1607

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement METH_FASTCALL for pyfunctions. #1619

Implement METH_FASTCALL for pyfunctions. #1619

birkenfeld commented May 19, 2021

birkenfeld commented May 19, 2021

davidhewitt commented May 19, 2021

birkenfeld commented May 19, 2021

davidhewitt commented May 20, 2021

davidhewitt left a comment

davidhewitt May 21, 2021

davidhewitt May 21, 2021

birkenfeld May 29, 2021

davidhewitt Jun 4, 2021

birkenfeld commented May 29, 2021

davidhewitt left a comment

davidhewitt Jun 5, 2021

birkenfeld Jun 5, 2021

davidhewitt Jun 5, 2021

Implement METH_FASTCALL for pyfunctions. #1619

Implement METH_FASTCALL for pyfunctions. #1619

Conversation

birkenfeld commented May 19, 2021

birkenfeld commented May 19, 2021

davidhewitt commented May 19, 2021

birkenfeld commented May 19, 2021

davidhewitt commented May 20, 2021

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt May 21, 2021

Choose a reason for hiding this comment

davidhewitt May 21, 2021

Choose a reason for hiding this comment

birkenfeld May 29, 2021

Choose a reason for hiding this comment

davidhewitt Jun 4, 2021

Choose a reason for hiding this comment

birkenfeld commented May 29, 2021

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt Jun 5, 2021

Choose a reason for hiding this comment

birkenfeld Jun 5, 2021

Choose a reason for hiding this comment

davidhewitt Jun 5, 2021

Choose a reason for hiding this comment