Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40376: [Python] Update for NumPy 2.0 ABI change in PyArray_Descr->elsize #40418

Merged
merged 11 commits into from
Mar 13, 2024

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 8, 2024

Rationale for this change

NumPy 2.0 is changing some ABI, see the issue description and numpy/numpy#25946 for more details.

The changes here should make our code compatible both with current numpy 1.x and future numpy 2.x

…escr->elsize

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Copy link

github-actions bot commented Mar 8, 2024

⚠️ GitHub issue #40376 has been automatically assigned in GitHub to PR creator.

@jorisvandenbossche
Copy link
Member Author

@seberg we are also using the c_metadata field, which was removed as well.

void SetDatetimeUnit(NPY_DATETIMEUNIT unit) {
PyAcquireGIL lock;
auto date_dtype = reinterpret_cast<PyArray_DatetimeDTypeMetaData*>(
PyArray_DESCR(reinterpret_cast<PyArrayObject*>(block_arr_.obj()))->c_metadata);
date_dtype->meta.base = unit;
}

@jorisvandenbossche
Copy link
Member Author

Should use PyDataType_C_METADATA for that? (with a similar backport for building with 1.x as for elsize)

@seberg
Copy link
Contributor

seberg commented Mar 8, 2024

Should use PyDataType_C_METADATA for that? (with a similar backport for building with 1.x as for elsize)

Yes, exactly, thanks!

@jorisvandenbossche
Copy link
Member Author

And the same for fields ;)

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit -g python

Copy link

github-actions bot commented Mar 8, 2024

Revision: 27f15ab

Submitted crossbow builds: ursacomputing/crossbow @ actions-d5247ffd0e

Task Status
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-11-python-3-amd64 Azure
test-debian-11-python-3-i386 GitHub Actions
test-fedora-39-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions

Copy link
Contributor

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changes look good to me from the NumPy side. Maybe we should add a way to get datetime metadata more directly some time (and if fields/names would be real functions we would have more freedom to modify it, but that isn't a downstream code change now at least :)).

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 8, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Mar 8, 2024
@kou
Copy link
Member

kou commented Mar 10, 2024

@github-actions crossbow submit -g wheel

This comment was marked as outdated.

@kou
Copy link
Member

kou commented Mar 11, 2024

It seems that wheel-manylinux failures and wheel-windows failures are related to this.

@jorisvandenbossche
Copy link
Member Author

Yes, those are related, but it's because it is not yet picking up the latest nightly numpy wheel to build against (the changes in this PR require numpy changes of a few days ago).
How do you ensure the docker image doesn't use a cached layer?

@github-actions github-actions bot removed the awaiting changes Awaiting changes label Mar 11, 2024
@jorisvandenbossche

This comment was marked as outdated.

This comment was marked as outdated.

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit wheel-windows-cp312-amd64

Copy link

Revision: 74caa26

Submitted crossbow builds: ursacomputing/crossbow @ actions-e37c72b24e

Task Status
wheel-windows-cp312-amd64 GitHub Actions

@jorisvandenbossche
Copy link
Member Author

OK, my last commit adding some extra prints to our cmake set up at least confirms that in this case it is using an older numpy:

-- Found NumPy version: 2.0.0.dev0+git20240203.5c34c87
-- NumPy include dir: C:/Python312/Lib/site-packages/numpy/_core/include

(from beginning of February, which matches with the time when those images were last updated #39622)

Will then try again to force rebuilding them here in this PR (it's not clear to me how that can be done generally)

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit wheel-windows-cp312-amd64

Copy link

Revision: 49df2c5

Submitted crossbow builds: ursacomputing/crossbow @ actions-d87170315f

Task Status
wheel-windows-cp312-amd64 GitHub Actions

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit wheel-windows-cp312-amd64

Copy link

Revision: 0d97bf8

Submitted crossbow builds: ursacomputing/crossbow @ actions-7909cb5614

Task Status
wheel-windows-cp312-amd64 GitHub Actions

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 12, 2024

OK, my last commit did succeed in ensuring we weren't accidentally using the cached images, and now it was building with against numpy 2.0.0b1 as shown by cmake, and now the windows build did succeed! (well, still red but now that's the known test failure on windows, which will be fixed with rebasing)

So the actual code changes are all good here. Now I only need to figure out how to ensure we rebuild the cached images properly (instead of just not using the cache as I did in the last commit, which I don't want to merge to main)

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit wheel-windows-cp312-amd64

Copy link

Revision: c3e18d1

Submitted crossbow builds: ursacomputing/crossbow @ actions-5ccdad511a

Task Status
wheel-windows-cp312-amd64 GitHub Actions

@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit test-conda-python-3.10-pandas-nightly test-conda-python-3.11-pandas-upstream_devel

Copy link

Revision: c3e18d1

Submitted crossbow builds: ursacomputing/crossbow @ actions-5525cfe7b2

Task Status
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Mar 12, 2024
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Mar 13, 2024

Remaining failures are now actual (and known) test failures (the pandas nightly is being fixed by #40429)

@jorisvandenbossche jorisvandenbossche merged commit a421314 into apache:main Mar 13, 2024
11 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting merge Awaiting merge label Mar 13, 2024
@jorisvandenbossche jorisvandenbossche deleted the gh-40376-numpy-abi branch March 13, 2024 07:56
@seberg
Copy link
Contributor

seberg commented Mar 13, 2024

Thanks for working on this, I wish rebuilding from scratch was less tedious.

@jorisvandenbossche
Copy link
Member Author

The tediousness was mostly because of our own building infrastructure, not because of numpy ;) Thanks for the help!

Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a421314.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants