Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards soundness of PyByteArray::to_vec #4742

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

robsdedude
Copy link

In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB.

See also #4736

Unfortunately it seems I can't write proper tests for this as Python 3.13t is not yet part of the test matrix. I'm aware that support for testing with 3.13 and 3.13t is still in it's early stages and for instance virtualenv does not yet support it.

In free-threaded Python, to_vec needs to make sure to run inside a critical
section so that no other Python thread is mutating the bytearray causing UB.

See also PyO3#4736
@robsdedude robsdedude changed the title Towards soundness of PyByteArrayMethods::to_vec Towards soundness of PyByteArray::to_vec Nov 29, 2024
@robsdedude robsdedude marked this pull request as ready for review November 29, 2024 09:28
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! We actually do have tests running for the free-threaded build, I would have been unhappy to declare support running without them! Similarly I have had virtualenv working just fine with 3.13t (haven't tried windows, though).

I think we could write a test which spawns a thread which does something to attempt to invalidate the data (maybe write to it using py.run or PySequenceMethods::set_slice) and confirm that the data read is the original data inserted, not the conflicting data (which should hopefully now block on either the GIL or the critical section depending on the build).

newsfragments/4742.fixed.md Outdated Show resolved Hide resolved
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
@davidhewitt davidhewitt mentioned this pull request Nov 29, 2024
@robsdedude
Copy link
Author

robsdedude commented Nov 29, 2024

@davidhewitt I tried to write a test runing bytearray.extend in one thread while reading the bytearray with to_vec() in another thread and found that I was able to read inconsistent (more precisely partially uninitialized memory) regardless whether the critical section change was in place or not. Digging deeper, I'm not surprised. If you look the the C implementation of bytearray, you'll see that no critical section is used throughout the whole file. All the memcpy and memmove calls are unprotected 😕

Not sure where to go from here.

However, no matter how hard I tried, I couldn't get it to segfault. So maybe there's something more to it that I'm not aware of.

@davidhewitt
Copy link
Member

I think that's not a suprise that it's hard to segfault; you'd have to do something like turn the uninitialized read into a cast on the bytes to create a structure in an invalid state.

Nevertheless, invalid reads alone are a clear security issue. This problem clearly gets a lot worse in freethreaded Python. My knee jerk reaction is to make all bytearray methods in PyO3 unsafe.

cc @ngoldbaum @colesbury is there any upstream opinion on how to handle bytearray objects on the free threaded build?

@ngoldbaum
Copy link
Contributor

I can't find any discussion about bytearray and free-threading in the CPython issue tracker, you may want to file an issue, especially if you can make a pure-python reproducer using the threading module. There are still lots of thread safety issues in CPython itself and we should make sure they all get tracked as we run into them.

@robsdedude
Copy link
Author

python/cpython#127472

@davidhewitt
Copy link
Member

Thanks for that. I'm a bit unsure what the way forward here is. Without upstream also using critical sections, as you observe, adding the single section here seems a bit moot. I think we cannot change our API in a patch release so I think the likely path at the moment is that we make all the methods unsafe in PyO3 0.24?

@alex
Copy link
Contributor

alex commented Dec 3, 2024 via email

@ngoldbaum
Copy link
Contributor

Hopefully a future Python release will fix the thread safety issues you identified and we can at least make the free-threaded build have similar guarantees compared with the GIL-enabled build.

@robsdedude
Copy link
Author

robsdedude commented Dec 4, 2024

so I think the likely path at the moment is that we make all the methods unsafe in PyO3 0.24?

I can see arguments for and against that. I'm slightly gravitating towards not doing it though. The way I see it is that PyO3's memory safety stands and falls with the soundness of the linked Python implementation. You have to assume that it's sound. If you don't, every PyO3 API would be unsafe (which I guess is Rust's standpoint, as every FFI call is unsafe). So in this particular case to_vec were safe if CPython were sound (by using critical sections around bytearray operations). In that sense PyO3 is as safe here as using pure Python is and maybe that should be the criteria whether you mark an API safe/unsafe in PyO3.

Just my 2 cents and you're much deeper into the world of this wonderful crate so ofc. it's up to you to decide 😇

adding the single section here seems a bit moot

I guess it is 🫤 Feel free to close the PR ⚰️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants