Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wrapper for PyASCIIObject.state bitfield accesses #3015

Merged
merged 1 commit into from
Mar 28, 2023
Merged

Implement wrapper for PyASCIIObject.state bitfield accesses #3015

merged 1 commit into from
Mar 28, 2023

Conversation

decathorpe
Copy link
Contributor

This is a first draft of my attempt to fix #1824 "properly" by writing a C wrapper for the PyASCIIObject.state bitfield accesses, as proposed here: #1824 (comment)


The original argument for making these functions unsafe is still valid, though - bitfield memory layout is not guaranteed to be stable across different C compilers, as it is "implementation defined" in the C standard. However, short of having CPython upstream provide non-inlined public functions to access this bitfield, this is the next best thing, as far as I can tell.

I've removed the #[cfg(target_endian = "little")] attributes from all things that are un-blocked by fixing this issue on big-endian systems, except for three tests, which look like expected failures considering that they do not take bit/byte order into account (for example, when writing to the bitfield).

  • ffi::tests::ascii_object_bitfield
  • types::string::tests::test_string_data_ucs2_invalid
  • types::string::tests::test_string_data_ucs4_invalid

All other tests now pass on both little-endian and big-endian systems.


I am aware that some parts of this PR are probably not in a state that's acceptable for merging as-is, which is why I'm filing this as a draft. Feedback about how to better integrate this change with pyo3-ffi would be great. :)

In particular, I'm unsure whether the #include statements in the C files are actually correct across different systems. I have only tested this on Fedora Linux so far.

I'm also open to changing the names of the C functions that are implemented in the wrapper. For now I chose the most obvious names that shouldn't cause collisions with other symbols.

@cuviper
Copy link

cuviper commented Mar 6, 2023

In particular, I'm unsure whether the #include statements in the C files are actually correct across different systems. I have only tested this on Fedora Linux so far.

If the only difference between those C files is the include path, then this should just be an -I/path argument. In fact, that's what you would get from python3-config --cflags, as well as other flags that may be relevant, but I don't know if anything uses that in PyO3 yet.

@mejrs
Copy link
Member

mejrs commented Mar 6, 2023

Personally I would prefer putting this behind a feature a) so a C compiler isn't needed if you don't use this b) there is a bit of a hurdle to jump over if you do need it

short of having CPython upstream provide non-inlined public functions to access this bitfield,

Is anything like that in the works? It would be nice to be able to get rid of this shim eventually :)

Also; is there any prior art on how C extensions handle this? It is an issue for them too.

@decathorpe
Copy link
Contributor Author

decathorpe commented Mar 7, 2023

Personally I would prefer putting this behind a feature a) so a C compiler isn't needed if you don't use this b) there is a bit of a hurdle to jump over if you do need it

The hurdle for users is already there because the functions are unsafe. The current little-endian-only implementation also relies on UB, it's just coincidence that most C compilers generate the same memory layout for this struct and bitfield ... so the C shim is not actually making things worse. :) But I understand that you'd want to make the dependency on cc optional, though I'm not sure if it's a good idea to make it a dependency only on big-endian targets (i.e. using something like [target."cfg(target_endian = \"big\")".build-dependencies.cc]).

short of having CPython upstream provide non-inlined public functions to access this bitfield,

Is anything like that in the works? It would be nice to be able to get rid of this shim eventually :)

Not as far as I know. It appears that CPython upstream would like to hide the internals of this struct rather than expose more of it publicly: python/cpython#89188 (comment)

Also; is there any prior art on how C extensions handle this? It is an issue for them too.

The C headers do provide macros that provide access to the bitfield, but C macros are obviously not usable via FFI.

@messense
Copy link
Member

messense commented Mar 7, 2023

Personally I would prefer putting this behind a feature a) so a C compiler isn't needed if you don't use this b) there is a bit of a hurdle to jump over if you do need it

second this, it will make cross compilation harder now that it requires python development header files.

@adamreichold
Copy link
Member

The C headers do provide macros that provide access to the bitfield, but C macros are obviously not usable via FFI.

I do not want to side track this, but what does bindgen produce for those headers? Something that we could put into version control or does it depend on the chosen build target?

@cuviper
Copy link

cuviper commented Mar 7, 2023

Here are excerpts of bindgen Python.h on Fedora x86_64
/* automatically generated by rust-bindgen 0.63.0 */

#[repr(C)]
#[derive(Copy, Clone, Debug, Default, Eq, Hash, Ord, PartialEq, PartialOrd)]
pub struct __BindgenBitfieldUnit<Storage> {
    storage: Storage,
}
impl<Storage> __BindgenBitfieldUnit<Storage> {
    #[inline]
    pub const fn new(storage: Storage) -> Self {
        Self { storage }
    }
}
impl<Storage> __BindgenBitfieldUnit<Storage>
where
    Storage: AsRef<[u8]> + AsMut<[u8]>,
{
    #[inline]
    pub fn get_bit(&self, index: usize) -> bool {
        debug_assert!(index / 8 < self.storage.as_ref().len());
        let byte_index = index / 8;
        let byte = self.storage.as_ref()[byte_index];
        let bit_index = if cfg!(target_endian = "big") {
            7 - (index % 8)
        } else {
            index % 8
        };
        let mask = 1 << bit_index;
        byte & mask == mask
    }
    #[inline]
    pub fn set_bit(&mut self, index: usize, val: bool) {
        debug_assert!(index / 8 < self.storage.as_ref().len());
        let byte_index = index / 8;
        let byte = &mut self.storage.as_mut()[byte_index];
        let bit_index = if cfg!(target_endian = "big") {
            7 - (index % 8)
        } else {
            index % 8
        };
        let mask = 1 << bit_index;
        if val {
            *byte |= mask;
        } else {
            *byte &= !mask;
        }
    }
    #[inline]
    pub fn get(&self, bit_offset: usize, bit_width: u8) -> u64 {
        debug_assert!(bit_width <= 64);
        debug_assert!(bit_offset / 8 < self.storage.as_ref().len());
        debug_assert!((bit_offset + (bit_width as usize)) / 8 <= self.storage.as_ref().len());
        let mut val = 0;
        for i in 0..(bit_width as usize) {
            if self.get_bit(i + bit_offset) {
                let index = if cfg!(target_endian = "big") {
                    bit_width as usize - 1 - i
                } else {
                    i
                };
                val |= 1 << index;
            }
        }
        val
    }
    #[inline]
    pub fn set(&mut self, bit_offset: usize, bit_width: u8, val: u64) {
        debug_assert!(bit_width <= 64);
        debug_assert!(bit_offset / 8 < self.storage.as_ref().len());
        debug_assert!((bit_offset + (bit_width as usize)) / 8 <= self.storage.as_ref().len());
        for i in 0..(bit_width as usize) {
            let mask = 1 << i;
            let val_bit_is_set = val & mask == mask;
            let index = if cfg!(target_endian = "big") {
                bit_width as usize - 1 - i
            } else {
                i
            };
            self.set_bit(index + bit_offset, val_bit_is_set);
        }
    }
}
//...
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct PyASCIIObject {
    pub ob_base: PyObject,
    pub length: Py_ssize_t,
    pub hash: Py_hash_t,
    pub state: PyASCIIObject__bindgen_ty_1,
    pub wstr: *mut wchar_t,
}
#[repr(C)]
#[repr(align(4))]
#[derive(Debug, Copy, Clone)]
pub struct PyASCIIObject__bindgen_ty_1 {
    pub _bitfield_align_1: [u8; 0],
    pub _bitfield_1: __BindgenBitfieldUnit<[u8; 4usize]>,
}
//...
impl PyASCIIObject__bindgen_ty_1 {
    #[inline]
    pub fn interned(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(0usize, 2u8) as u32) }
    }
    #[inline]
    pub fn set_interned(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(0usize, 2u8, val as u64)
        }
    }
    #[inline]
    pub fn kind(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(2usize, 3u8) as u32) }
    }
    #[inline]
    pub fn set_kind(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(2usize, 3u8, val as u64)
        }
    }
    #[inline]
    pub fn compact(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(5usize, 1u8) as u32) }
    }
    #[inline]
    pub fn set_compact(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(5usize, 1u8, val as u64)
        }
    }
    #[inline]
    pub fn ascii(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(6usize, 1u8) as u32) }
    }
    #[inline]
    pub fn set_ascii(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(6usize, 1u8, val as u64)
        }
    }
    #[inline]
    pub fn ready(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(7usize, 1u8) as u32) }
    }
    #[inline]
    pub fn set_ready(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(7usize, 1u8, val as u64)
        }
    }
    #[inline]
    pub fn new_bitfield_1(
        interned: ::std::os::raw::c_uint,
        kind: ::std::os::raw::c_uint,
        compact: ::std::os::raw::c_uint,
        ascii: ::std::os::raw::c_uint,
        ready: ::std::os::raw::c_uint,
    ) -> __BindgenBitfieldUnit<[u8; 4usize]> {
        let mut __bindgen_bitfield_unit: __BindgenBitfieldUnit<[u8; 4usize]> = Default::default();
        __bindgen_bitfield_unit.set(0usize, 2u8, {
            let interned: u32 = unsafe { ::std::mem::transmute(interned) };
            interned as u64
        });
        __bindgen_bitfield_unit.set(2usize, 3u8, {
            let kind: u32 = unsafe { ::std::mem::transmute(kind) };
            kind as u64
        });
        __bindgen_bitfield_unit.set(5usize, 1u8, {
            let compact: u32 = unsafe { ::std::mem::transmute(compact) };
            compact as u64
        });
        __bindgen_bitfield_unit.set(6usize, 1u8, {
            let ascii: u32 = unsafe { ::std::mem::transmute(ascii) };
            ascii as u64
        });
        __bindgen_bitfield_unit.set(7usize, 1u8, {
            let ready: u32 = unsafe { ::std::mem::transmute(ready) };
            ready as u64
        });
        __bindgen_bitfield_unit
    }
}

So that does have some conditionals for target_endian = "big", at least, but I don't know how portable it is overall.

@adamreichold
Copy link
Member

So this seems to boil down to

#[repr(C)]
#[repr(align(4))]
#[derive(Debug, Copy, Clone)]
pub struct PyASCIIObject__bindgen_ty_1 {
    pub _bitfield_align_1: [u8; 0],
    pub _bitfield_1: __BindgenBitfieldUnit<[u8; 4usize]>,
}

and

        let bit_index = if cfg!(target_endian = "big") {
            7 - (index % 8)
        } else {
            index % 8
        };

which would seem palatable enough. But indeed, we'd need to check the C standard legalese on whether this is mandated or whether it is just Clang doing this way.

To be honest, if I try to be in a very very pragmatic mood, the above might be considered an improvement over the status quo and allow us to support big endian targets. I also suspect that the only way to be truly correct is the shim going through the actual C target compiler, i.e. we are not really correct ATM as well, which is why I could personally live with this.

@cuviper
Copy link

cuviper commented Mar 7, 2023

I don't think the C standard mandates anything, but in practice we can examine what "implementation defined" actually does on targets we care about. It should still be part of the platform ABI, so we wouldn't have to worry about GCC vs Clang (on a given target).

@decathorpe
Copy link
Contributor Author

If bindgen produces usable functions to access these fields that would of course be perfect, as it would mean including C files is not necessary at all. The only thing that would need to be taken into account is possible differences between Python versions. It appears that the exact bitfield layout sometimes changes between major versions (like here), so it might be necessary to keep copies around for different target CPython versions.

@adamreichold
Copy link
Member

Keeping multiple structure definitions around to support the few Python major versions we do seems preferable to having to involve a C compiler IMHO.

@decathorpe decathorpe changed the title Implement "safe" C wrapper for PyASCIIObject.state bitfield accesses Implement wrapper for PyASCIIObject.state bitfield accesses Mar 8, 2023
@decathorpe
Copy link
Contributor Author

Ok, I've changed this PR to no longer depend on the cc crate or a C compiler at build time, and instead used bindgen to generate some architecture-independent code for me.

The BitfieldUnit helper is generated by bindgen to provide "safe" (whatever "safe" means when you're in "implementation defined" territory) access to bitfield members, and the PyASCIIObject.state field is now a PyASCIIObjectState struct that wraps a BitfieldUnit (with safe getters and setters, as generated by bindgen) instead of a raw u32. Since this field is part of the public API, this is a SemVer breaking change.

I checked with all versions of Python that I have access to (3.7, 3.9, 3.10, 3.11, and 3.12) and bindgen generated the same code for all of them (which is why there's no copies for different versions any longer) - except for Python 3.12, where the "ready" flag was apparently removed from the bitfield (but Python 3.12 is not yet supported by PyO3).

I have adapted the tests to the changed API, removed target_endian = "little" cfg-gates from them, and was able to successfully run the test suite for both pyo3-ffi and pyo3 on different architectures (x86_64, i686, powerpc64le, aarch64, and s390x - the latter of which is a big-endian architecture).

@decathorpe
Copy link
Contributor Author

I don't really understand some of the test failures in CI, especially the pyo3-ffi-check tests ... how do I tell them that BitfieldUnit and PyASCIIObjectState exist (or should it not know about them? I'm confused).

Also, can I add #[allow(clippy::useless-transmute)] to the getter / setter code generated by bindgen to silence warnings/errors like "error: transmute from a type (u32) to itself"? I didn't want to mess with the generated code since I trust bindgen to get this right more than trust myself.

@adamreichold
Copy link
Member

I don't really understand some of the test failures in CI, especially the pyo3-ffi-check tests ... how do I tell them that BitfieldUnit and PyASCIIObjectState exist (or should it not know about them? I'm confused).

I think there are no distinct failures here: Our FFI checking macros do not yet support generics as we never used them in the FFI definitions before. Personally, I would suggest just folding the bindgen-generated wrapper into the concrete struct to avoid them outright but see below.

The other issue is that your type is named PyASCIIObject whereas the bindgen-generated one (against which it is checked) is called PyASCIIObject__bindgen_ty_1. I suspect we either need to keep align ourselves with the bindgen-generated name which would make our API less than ergonomic, we remove the inner type from our API by making the state field private which would reduce our API's functionality or maintain a list of "aliases" in our pyo3-ffi-check crate matching those names up. @davidhewitt What do you think?

Also, can I add #[allow(clippy::useless-transmute)] to the getter / setter code generated by bindgen to silence warnings/errors like "error: transmute from a type (u32) to itself"? I didn't want to mess with the generated code since I trust bindgen to get this right more than trust myself.

I would really be interested what the other maintainers think, but the first impulse for me personally would be to slim down the generated code to what we actually need. But I also understand the idea of really delegating this to bindgen and thus being able to simply import an updated version of these bindings in the future. Let's see what other people think...

It should still be part of the platform ABI, so we wouldn't have to worry about GCC vs Clang (on a given target).

I think we should take the pragmatic way as the change is proposed now as it is definitely an improvement over the status quo, I just wanted to mention that I think the choice of target is the bigger problem than GCC versus Clang. Or put differently, if we really want to rely on the ABI specifications we would need to limit the availability of the type definitions to the finite set of targets which we have checked whereas now (and in the pragmatic approach), one could at least compile the code on targets we did not even consider.

@@ -47,39 +252,35 @@ pub struct PyASCIIObject {
/// unsigned int ascii:1;
/// unsigned int ready:1;
/// unsigned int :24;
pub state: u32,
pub state: PyASCIIObjectState,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily need to be a public change. If you move the generated PyASCIIObjectState to a private module, it would be trivial to add From<u32> for it using u32::to_ne_bytes. Then the accessors would look something like PyASCIIObjectState::from(self.state).interned().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work for now, but I think only as long as u32 still yields a PyASCIIObject with the right size and alignment.

That said, I don't this breakage would be something we would refrain from for the next release 0.19.0 where it would be within semver bounds. My personal expectation is that this is mainly used via the existing accessors or opaquely.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's up to you. Bindgen's output looks like a lot more pub surface than I would want, so u32 keeps that opaque, or a newtype could accomplish that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bindgen's output looks like a lot more pub surface than I would want, so u32 keeps that opaque, or a newtype could accomplish that too.

Yes, as written above, I would also condense that into the minimum we actually need, i.e. the layout-relevant parts and the few accessors we want to provide. Let's see what the others think about that though.

@cuviper
Copy link

cuviper commented Mar 8, 2023

It should still be part of the platform ABI, so we wouldn't have to worry about GCC vs Clang (on a given target).

I think we should take the pragmatic way as the change is proposed now as it is definitely an improvement over the status quo, I just wanted to mention that I think the choice of target is the bigger problem than GCC versus Clang. Or put differently, if we really want to rely on the ABI specifications we would need to limit the availability of the type definitions to the finite set of targets which we have checked whereas now (and in the pragmatic approach), one could at least compile the code on targets we did not even consider.

Right, this can only hold up for known target ABIs -- outside that is still a big question mark. They may follow the same approach, as it's a pretty straightforward way to implement bitfields, but it's not necessarily the only way.

@decathorpe
Copy link
Contributor Author

I don't really understand some of the test failures in CI, especially the pyo3-ffi-check tests ... how do I tell them that BitfieldUnit and PyASCIIObjectState exist (or should it not know about them? I'm confused).

I think there are no distinct failures here: Our FFI checking macros do not yet support generics as we never used them in the FFI definitions before. Personally, I would suggest just folding the bindgen-generated wrapper into the concrete struct to avoid them outright but see below.

I can try to do that, but it would make it harder to incorporate changes from bindgen when updating the definitions for future Python releases (or if a new bindgen version produced better code).

The other issue is that your type is named PyASCIIObject whereas the bindgen-generated one (against which it is checked) is called PyASCIIObject__bindgen_ty_1. I suspect we either need to keep align ourselves with the bindgen-generated name which would make our API less than ergonomic, we remove the inner type from our API by making the state field private which would reduce our API's functionality or maintain a list of "aliases" in our pyo3-ffi-check crate matching those names up. @davidhewitt What do you think?

I think you meant to say that the bitfield wrapper type is named PyASCIIObject PyASCIIObjectState? That is true. I renamed some things that were generated by bindgen. The name is automatically generated as PyASCIIObject__bindgen_ty_1 because it's the first anonymous struct type inside the PyASCIIObject struct. IIRC if there were more than one anonymous type, they would just be enumerated as foo__bindgen_ty_{1,2,3,4...}. I thought making the name meaningful (it's the state field of a PyASCIIObject struct) would make sense.

Also, can I add #[allow(clippy::useless-transmute)] to the getter / setter code generated by bindgen to silence warnings/errors like "error: transmute from a type (u32) to itself"? I didn't want to mess with the generated code since I trust bindgen to get this right more than trust myself.

I would really be interested what the other maintainers think, but the first impulse for me personally would be to slim down the generated code to what we actually need. But I also understand the idea of really delegating this to bindgen and thus being able to simply import an updated version of these bindings in the future. Let's see what other people think...

That was part of the reason why I didn't change the function bodies at all - it would make it easier to update / compare them with things generated by bindgen in the future. The only things I did change from what bindgen produced was to give the anonymous PyASCIIObject__bindgen_ty_1 struct an explicit name, and to add a PyASCIIObjectState::new constructor.

@davidhewitt
Copy link
Member

FYI I noticed that it might be possible to remove the bitfield in Python 3.12, if that lands upstream we might be able to simplify this eventually. python/cpython#102589

@decathorpe decathorpe marked this pull request as ready for review March 11, 2023 22:55
@decathorpe
Copy link
Contributor Author

Alright, I managed to simplify the code a little, and I've reverted the PyASCIIObject.state struct field back to an u32 so there's no longer a change of the public API. All structs and methods that are only intended for internal use are now also no longer marked pub. All accesses to the underlying bitfield are now handled by getters and setters, which should work in most cases (i.e. on most little- and big-endian architectures).

I silenced the clippy::useless_transmute warning, because c_uint and u32 are not necessarily always the same type as far as I know, so the transmute seems to be correct. I also added short doc comments for the new PyASCIIObjectState struct and for the new unsafe getters and setters on PyASCIIObject - mostly to document what's safe / expected and what's not.


I think these changes should address most or all issues that have been mentioned in previous comments. The CI also looks (mostly) happy and green now. I also ran the pyo3 test suite on all architectures that are supported by Fedora (x86_64, i686, aarch64, powerpc64le, and 390x) and everything passed there, as well. Please let me know if there's anything else that I can do to move this forward. :)

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems good to me with a couple of tiny suggestions to remove some derived traits from the private types.

Final step is to add a newsfragment - I suggest something like 3015.added.md with "Support PyStringData on big-endian targets"

pyo3-ffi/src/cpython/unicodeobject.rs Outdated Show resolved Hide resolved
pyo3-ffi/src/cpython/unicodeobject.rs Outdated Show resolved Hide resolved
@decathorpe
Copy link
Contributor Author

Great, I removed all the unnecessary derived trait implementations (it turns out, it was all of them) and added a changelog fragment.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks very much!

I'll continue to try to get something simpler in for 3.12 or another future Python, however looks like my original simplification suggested above won't stick ;)

bors r+

@bors
Copy link
Contributor

bors bot commented Mar 28, 2023

Build succeeded:

@bors bors bot merged commit ba6261c into PyO3:main Mar 28, 2023
@decathorpe
Copy link
Contributor Author

Awesome. Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ffi::cpython::unicodeobject::tests::{ascii,ucs4} unit tests segfault on s390x
6 participants