API: Restructure the dtype struct to be new dtype friendly #25943

seberg · 2024-03-05T15:29:55Z

Note this PR is an actual ABI breaking change: No downstream project should be expected to work without recompiling (which means that our doc builds will just crash also).

This PR does a few changes, I would like to keep docs in a follow-up, but happy either way. Right now marking as draft because:

I am seeing test failures in SciPy, I am not sure yet where they originate (some general issue, something in SciPy or something here.). Its odd, I need to figure this out.
I may follow up with more changes (want to discuss briefly below), but this is a good start also to start review.
The Cython additions should get simple test and maybe need a closer look.

The actual changes are the following:

Flags are not a uint64
elsize and alignment are now intp (I don't think it matters for alignment but seemed fine).
c_metadata, subarray, fields, and names is now fully gone. (metadata is still there).

Plus cleanup and moving things between headers as they are now version dependent.

@ngoldbaum three things I would like your opinion on:

Right now I kept metadata for the simple reason that it seemed at least somewhat useful and used and because of the preparation that was easier. I am happy to remove it.
I could add one (maybe two) void *reserved slot that we define to be NULL but can give a meaning in the future. I don't care too much about it, we have flags and subclasses, but it is possible...
I kept the elsize name, just to keep the diff smaller. But we could rename both or just the accessor to ITEMSIZE.

None of them seem like a huge deal, compared to getting rid of the other fields and bumping the size.

charris · 2024-03-05T15:33:53Z

I assume this is a blocker for the 2.0.0 release?

ngoldbaum · 2024-03-05T15:49:04Z

Right now I kept metadata for the simple reason that it seemed at least somewhat useful and used and because of the preparation that was easier. I am happy to remove it.

I did another code search and found one usage in newish code, here to stuff some extra metadata into the descriptor (it looks like to work around a numpy limitation).

Seems like a nice thing to have and no strong reason to remove it, let's keep it.

I could add one (maybe two) void *reserved slot that we define to be NULL but can give a meaning in the future. I don't care too much about it, we have flags and subclasses, but it is possible...

One could imagine always providing a per-descriptor allocator like stringdtype does, for example, as a way to deal with thread safety and add nogil support.

So let's do it. My only thought is that only having one extra slot might be limiting, could we define it in such a way that it can be extensible (e.g. using something like the PyTypeSlot struct to allow an arbitrary number of extra members defined at runtime)? That's just me spitballing, it might be a bad idea.

I kept the elsize name, just to keep the diff smaller. But we could rename both or just the accessor to ITEMSIZE.

Sure, let's leave it, and in the future we can refactor numpy to use an accessor macro with a better name. There's a ton of downstream code using elsize too so this would cause some significant downstream code churn.

tylerjereddy · 2024-03-05T15:50:57Z

I am seeing test failures in SciPy, I am not sure yet where they originate

Let me know if you want help. Even if it is just to brute force dissect out the part(s) of the codes related.

seberg · 2024-03-05T15:58:38Z

There's a ton of downstream code using elsize too so this would cause some significant downstream code churn.

Well, we have to force those to churn and use PyDataType_ELSIZE(), although many can use PyArray_ITEMSIZE() instead (this is what I did in SciPy).
I don't think this is too bad, 2.5k of finds on github isn't actually that much :) and I quickly get to dead projects ;).

seberg · 2024-03-05T20:42:03Z

OK, the SciPy issues came also down to pybind11 and required an unfortunate little hack pending pybind11 fixup. Luckily, most places that use pybind11, use the template type version py::array_t to create new arrays and that isn't affected.

I have updated the docs. We should maybe discuss this briefly. (EDIT: It is unfortunate that the docs don't build, I think we can wait for SciPy to work, but if more fails, may need to see how to get the changes in)

I will look into pybind11 update tomorrow, but I doubt it should be hard. (Unless field sizes seem wrong, further fixups like adding a second void * would not actually break ABI).

seberg · 2024-03-05T21:34:59Z

Two notes:

I checked, and I could add one more reserved field easily (without adding cruft for Cython), more and 32bit platforms sizes shouldn't match.
Before merging, I will split the docs into their own PR and merge that. That way the correct docs are there without a few days of breathing room for at least scipy to have new wheel builds.

This modifies the main dtype/descriptor struct to: * Use intp for the elsize (and alignment) * A uint64 for flag space * To actually remove c_metadata and fields related to structured dtypes. It thus *breaks ABI* and the unfortunate souls who require access to `->elsize` or similar fields will have to vendor `npy_2_compat.h` or do something similar. (This should not be super many though, most can use PyArray_ITEMSIZE). The changes also require moving a few other places to be run-time and fixing allocation of structs. This commit is not complete: New warnings and errors still need to be fixed.

(for once shipping f2py makes it easier)

Note that the (rare) aligned struct case will not allow unpickling NumPy 2.x files with NumPy 1.x. This could be added if necessary. Unpickling 1.x files in 2.x is unproblematic

… use

2 should work (unless I miscounted badly).

seberg · 2024-03-06T18:59:53Z

I think this should be ready, but of course someone should have a look over.

@lithomas1 once this is merged, pandas will need some small updates to the json c reader (IIRC), shouldn't be more than the similar ones that arrow needs (@jorisvandenbossche is aware). If the arrow code-paths are hit, this might crash pandas, although building the wheel would still be fine.

Right now, I don't think it is looking too bad, but of course I don't know... if it ends up being bad we might have to keep elsize as an integer to break fewer code.

mattip

Makes sense to me, and lays the groundwork for dropping the old struct in a few years when all user-defined dtypes move to NumPy2. I still need a bit of convincing around the reserved space, but not a big deal since the whole structure is meant to be opaque.

Edit: the spare fields are needed for subclassing. Makes sense.

doc/source/reference/c-api/types-and-structures.rst

numpy/__init__.cython-30.pxd

numpy/__init__.pxd

numpy/_core/include/numpy/ndarraytypes.h

Co-authored-by: Matti Picus <matti.picus@gmail.com>

mattip · 2024-03-07T11:45:03Z

Tests are failing but not because of this PR, probably due to #21760. I will comment there. The error is (note the extra .)

    def test_polynomial_str(self):
        res = str(poly.Polynomial([0, 1]))
        tgt = '0.0 + 1.0 x'
>       assert_equal(res, tgt)
E       AssertionError: 
E       Items are not equal:
E        ACTUAL: '0.0 + 1.0·x'
E        DESIRED: '0.0 + 1.0 x'

mattip · 2024-03-07T11:48:05Z

I will merge this to keep the momentum going. Let's see what breaks downstream...

mattip · 2024-03-07T11:48:17Z

Thanks @seberg

lithomas1 · 2024-03-07T20:06:58Z

Thanks for the heads up.

Can someone from numpy hit the build button on the nightlies now that this is merged?

seberg · 2024-03-07T20:23:32Z

Yesterday it sounded a bit like you wanted to do that @rgommers, so you can make sure to do it on SciPy quickly after as well?

charris · 2024-03-07T20:27:37Z

I have already triggered the builds, at least those not on cirrus.

jakevdp · 2024-03-08T18:28:59Z

Hi - I found that this PR broke the openxla build, due to this line: https://github.com/openxla/xla/blob/24eaeeab8e7465cef0fc655cd9fd8f6060485a27/xla/python/nb_numpy.h#L55

What would be the best way to access elsize in a way that is compatible with both Numpy 1.X and 2.X?

jorisvandenbossche · 2024-03-08T19:16:42Z

@jakevdp see #25946 for the relevant doc changes

The direct equivalent is PyDataType_ELSIZE, but since I think you have an array (and not just a descr), you can use PyArray_ITEMSIZE directly (for PyDataType_ELSIZE you need to add a small shim to make it compile on 1.x)

seberg · 2024-03-08T19:48:11Z

Right, the caveat is, it needs array_import now, but you probably already have that.

jakevdp · 2024-03-08T20:37:24Z

Thanks. Unfortunately, what the code has access to there is just a descr, not the array, so PyArray_ITEMSIZE is not applicable. Is there any way to get descr->elsize in a cross-version-compatible way given descr is a PyArray_Descr? I can't find any answer to that in the docs you linked to.

jakevdp · 2024-03-08T20:41:24Z

Maybe I need to use compiler directives to conditionally define PyDataType_ELSIZE when building against older numpy versions – would that be the recommended approach here?

ngoldbaum · 2024-03-08T20:44:55Z

Maybe I need to use compiler directives to conditionally define PyDataType_ELSIZE

Yes unfortunately this access pattern needs to be dealt with using e.g. an npy_2_compat.h header. See here and below
https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/npy_2_compat.h#L142

seberg · 2024-03-08T20:47:28Z

would that be the recommended approach here?

Yes, just #if NPY_ABI_VERSION < 0x02000000 and define the macro. You could copy npy_2_compat.h and include it explicitly. I like to have that, but wouldn't use it myself unless you would need it in a few places.

jakevdp · 2024-03-08T20:55:27Z

Ah, I see now that the PyDataType_ELSIZE solution is mentioned at https://numpy.org/devdocs/numpy_2_0_migration_guide.html#the-pyarray-descr-struct-has-been-changed. Thanks!

github-actions bot added the 30 - API label Mar 5, 2024

seberg added this to the 2.0.0 release milestone Mar 5, 2024

seberg force-pushed the descr-abi-break branch from d269cc4 to 2d71e3b Compare March 5, 2024 16:30

seberg mentioned this pull request Mar 5, 2024

MAINT: Bump npy2_compat.h and add temporary pybind11 workaround scipy/scipy#20193

Merged

seberg marked this pull request as ready for review March 5, 2024 20:42

seberg mentioned this pull request Mar 5, 2024

[BUG]: latest pybind11 (2.11.1) version not supporting numpy 2.0 pybind/pybind11#5009

Closed

3 tasks

rgommers mentioned this pull request Mar 6, 2024

Coordination: last items before branching 2.0.x #25918

Closed

10 tasks

seberg mentioned this pull request Mar 6, 2024

DOC: Add and fixup/move docs for descriptor changes #25946

Merged

seberg force-pushed the descr-abi-break branch from 4c8c72a to a99c67e Compare March 6, 2024 07:31

seberg mentioned this pull request Mar 6, 2024

API: Make numpy.h compatible with both NumPy 1.x and 2.x pybind/pybind11#5050

Merged

seberg added 8 commits March 6, 2024 19:41

MAINT: Fixup f2py for changes

7eb4410

(for once shipping f2py makes it easier)

MAINT: Fix pickling/unpickling for unsigned flags

8e92d96

Note that the (rare) aligned struct case will not allow unpickling NumPy 2.x files with NumPy 1.x. This could be added if necessary. Unpickling 1.x files in 2.x is unproblematic

MAINT: Use direct subarray access to avoid warnings

7d3949e

MAINT: Fix buffer formatting for larger itemsize

cf50ad2

BUG: Adjust descriptor member access (as old PR, forgot it here)

0412866

MAINT: Add a void *reserved_null field to descr for possible future…

9cb60ed

… use

MAINT: As discussed make it two NULL fields at the end of dtype

4c38ec2

2 should work (unless I miscounted badly).

seberg force-pushed the descr-abi-break branch from a99c67e to 4c38ec2 Compare March 6, 2024 18:42

mattip reviewed Mar 7, 2024

View reviewed changes

DOC,MAINT: Apply suggestions from code review

2d84ae6

Co-authored-by: Matti Picus <matti.picus@gmail.com>

mattip merged commit dbbf235 into numpy:main Mar 7, 2024
57 of 63 checks passed

mattip mentioned this pull request Mar 7, 2024

DOC: 2.0 release highlights and compat notes changes #25937

Merged

ngoldbaum mentioned this pull request Mar 7, 2024

BUG: avoid incorrect type punning in NpyString_acquire_allocators #25958

Merged

neutrinoceros mentioned this pull request Mar 8, 2024

BLD: fix building against numpy dev liberfa/pyerfa#140

Merged

seberg deleted the descr-abi-break branch March 8, 2024 19:48

neutrinoceros mentioned this pull request Apr 1, 2024

BLD: build with numpy instead of oldest-supported-numpy pydata/numexpr#478

Merged

bernhardkaindl mentioned this pull request Oct 24, 2024

salome,-med,-medcoupling: new versions, new/changed variants spack/spack#46576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Restructure the dtype struct to be new dtype friendly #25943

API: Restructure the dtype struct to be new dtype friendly #25943

seberg commented Mar 5, 2024

charris commented Mar 5, 2024

ngoldbaum commented Mar 5, 2024 •

edited

Loading

tylerjereddy commented Mar 5, 2024

seberg commented Mar 5, 2024

seberg commented Mar 5, 2024 •

edited

Loading

seberg commented Mar 5, 2024

seberg commented Mar 6, 2024

mattip left a comment •

edited

Loading

mattip commented Mar 7, 2024

mattip commented Mar 7, 2024

mattip commented Mar 7, 2024

lithomas1 commented Mar 7, 2024 •

edited

Loading

seberg commented Mar 7, 2024

charris commented Mar 7, 2024

jakevdp commented Mar 8, 2024 •

edited

Loading

jorisvandenbossche commented Mar 8, 2024

seberg commented Mar 8, 2024

jakevdp commented Mar 8, 2024

jakevdp commented Mar 8, 2024 •

edited

Loading

ngoldbaum commented Mar 8, 2024 •

edited

Loading

seberg commented Mar 8, 2024

jakevdp commented Mar 8, 2024

API: Restructure the dtype struct to be new dtype friendly #25943

API: Restructure the dtype struct to be new dtype friendly #25943

Conversation

seberg commented Mar 5, 2024

charris commented Mar 5, 2024

ngoldbaum commented Mar 5, 2024 • edited Loading

tylerjereddy commented Mar 5, 2024

seberg commented Mar 5, 2024

seberg commented Mar 5, 2024 • edited Loading

seberg commented Mar 5, 2024

seberg commented Mar 6, 2024

mattip left a comment • edited Loading

Choose a reason for hiding this comment

mattip commented Mar 7, 2024

mattip commented Mar 7, 2024

mattip commented Mar 7, 2024

lithomas1 commented Mar 7, 2024 • edited Loading

seberg commented Mar 7, 2024

charris commented Mar 7, 2024

jakevdp commented Mar 8, 2024 • edited Loading

jorisvandenbossche commented Mar 8, 2024

seberg commented Mar 8, 2024

jakevdp commented Mar 8, 2024

jakevdp commented Mar 8, 2024 • edited Loading

ngoldbaum commented Mar 8, 2024 • edited Loading

seberg commented Mar 8, 2024

jakevdp commented Mar 8, 2024

ngoldbaum commented Mar 5, 2024 •

edited

Loading

seberg commented Mar 5, 2024 •

edited

Loading

mattip left a comment •

edited

Loading

lithomas1 commented Mar 7, 2024 •

edited

Loading

jakevdp commented Mar 8, 2024 •

edited

Loading

jakevdp commented Mar 8, 2024 •

edited

Loading

ngoldbaum commented Mar 8, 2024 •

edited

Loading