Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure capsules stay alive when caching pointers into their interior #373

Merged
merged 2 commits into from
Mar 30, 2023

Conversation

adamreichold
Copy link
Member

While this started as a simple PyO3 API usage modernization, I think we actually need to make sure the capsules stay alive as long as we hold onto pointers into their interior as the modules and hence attributes could theoretically be deleted leaving us with a dangling pointer.

Leaking references to the capsules themselves seemed like the easiest way to avoid the overhead of calling PyCapsule_Pointer for each access. This should not be a functional limitation as we never update the cached pointer in any case, i.e. storing Py<PyCapsule> would have effectively the same result as the intentional leak.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you mean. I wonder if there's an alternative we can do here to leaking and stashing in a static? With the prevailing winds in CPython seeming to head towards sub-interpreters and multi-phase initialization (PEP 489) I'm feeling like we need to at least try to not make PyO3 / numpy's incompatibility with that world knowingly worse... 🤔

@adamreichold
Copy link
Member Author

adamreichold commented Mar 28, 2023

I wonder if there's an alternative we can do here to leaking and stashing in a static?

From my point of view, a static object is just the fastest way of accessing persistent state.

What is the fastest way to get any state managed by the Python interpreter? Accessing an attribute of a well-known module? Are there any alternatives for native code to at least stash a pointer-to-void somewhere? Maybe in the Python thread state? If not, I guess the additional overhead of calling PyCapsule_Pointer would not matter at all. However, the pointer cache used by rust-numpy was a direct result of reported performance issues IIRC.

What we could also do is try to amortize such an attribute access over many uses, e.g. PyO3 does the access once and stashes the result in a thread_local! which every extension using PyO3 could then use? Possibly lazily to not slow down calls which do not need this persistent state?

@adamreichold
Copy link
Member Author

adamreichold commented Mar 28, 2023

Another idea would be to continue to use static variables, but register them (their addresses) with PyO3 so that PyO3 could clear these variables when the interpreter changes/is finalized? This way, we could at least get away with storing things Py<PyCapsule> have fast access once initialized and we would just automagically re-initialize if the extension was used with a different sub-interpreter. Of course, this could devolve into an attribute look-up on every access if the sub-interpreter changes between every access.

@davidhewitt
Copy link
Member

I think that in principle we could use PyModule_GetState with storage for arbitrary data as an alternative for statics. Certainly that's what the CPython devs seem to want us to do - e.g. https://docs.python.org/3.11/howto/isolating-extensions.html#module-state-access-from-functions

(That doc is actually pretty good. It does also allude to per-interpreter state, though doesn't describe how to use it.)

I haven't got a concrete design for how we actually make this nice to use. I feel like it might end up requiring a thread-local "context" similar to what tokio does. Still a lot of thinking to do.

Anyway, I think my comment here might actually have been unhelpful, because it's not clear to me whether numpy supports PEP 489 anyway. So there may not be much value in worrying about it here.

@adamreichold
Copy link
Member Author

I think that in principle we could use PyModule_GetState with storage for arbitrary data as an alternative for statics.

Is it always obvious which module to use? For rust-numpy we generally reach for NumPy-provided modules but I suspect PyO3 would rather use the "current" module, i.e. the top-level module of the extension we are implementing. But is that always well-defined? Are "free standing functions" implemented using PyO3 and passed to pure Python code as context possible as well?

@kngwyu
Copy link
Member

kngwyu commented Mar 30, 2023

Sorry for complicating the discussion... but should the API capsules (like _ARRAY_API) also be allocated into the heap and a part of the 'module state'? They are just immutable function tables.

@adamreichold
Copy link
Member Author

adamreichold commented Mar 30, 2023

Sorry for complicating the discussion... but should the API capsules (like _ARRAY_API) also be allocated into the heap and a part of the 'module state'? They are just immutable function tables.

I think so in a world where the Python interpreter and all modules can be unloaded and possibly different versions be loaded later into the same process. Those function tables would refer to possibly non-existing addresses or functions with incompatible signatures after re-initialization of Python.

(As @davidhewitt pointed out above, NumPy may not yet actually support this full de-initialization and re-initialization, but it is the direction the Python ecosystem seems to be moving towards.)

((As an aside, the capsules themselves are already on the Python heap. What we are caching here in a static is a pointer into these heap-allocated capsules to directly access the function tables without going through e.g. PyCapsule_Pointer.))

@kngwyu
Copy link
Member

kngwyu commented Mar 30, 2023

Thank you. I think I understand now.

When numpy started using module state, I think we can use it. But so far, caching the module into a static variable would be fine.
(Edited: I was not aware of the potential problem of the method caching used in the shared array implementation...)

@davidhewitt
Copy link
Member

Is it always obvious which module to use? For rust-numpy we generally reach for NumPy-provided modules but I suspect PyO3 would rather use the "current" module, i.e. the top-level module of the extension we are implementing. But is that always well-defined? Are "free standing functions" implemented using PyO3 and passed to pure Python code as context possible as well?

Indeed I think it's a hard problem. I vaguely hope that by making user-facing APIs to manipulate module state as well as making the "current module" more obvious by e.g. PyO3/pyo3#2367 then there might just be enough flexibility to make module state work. Probably some experimentation for us ahead :)

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've all agreed that it's too early to start worrying about not using statics here!

@adamreichold adamreichold merged commit d7fdc43 into main Mar 30, 2023
@adamreichold adamreichold deleted the store-capsules branch March 30, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants