Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: NumPy 2 compatibility tracker #4874

Closed

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Apr 12, 2024

PR Summary

This PR isn't meant to be merged

I've been watching numpy/numpy#26191
At the time I'm opening this, all our core dependencies (matplotlib, unyt, ewah-bool-utils) already have compatible releases.
This will unleash full CI with numpy 2 at runtime to reveal which of our optional and transitional dependencies still lack support for it, so I have a chance to reach out to them before the final release drops.

name compatible release on PyPI tracker resource, notes
astropy 🟠 v6.1.0 is ABI-compatible but not fully runtime-compatible (still good enough for yt compat)
cartopy
h5py
netcdf4 Unidata/netcdf4-python#1317
pandas
arm-pyart ARM-DOE/pyart#1550
pykdtree storpipfugl/pykdtree#113
scipy
shapely shapely/shapely#1972
xarray pydata/xarray#8844

This table may be incomplete, but CI doesn't forgive, which is why this is a PR and not an issue.

edit history:
2024/04/12 created
2024/04/13
2024/05/05 new entry: astropy
2024/06/14: xarray and netcdf4 moved to ✅ status, astropy degraded to 🟠

@neutrinoceros neutrinoceros added the do not merge this change should not be merged until later label Apr 12, 2024
@neutrinoceros neutrinoceros changed the title TST: pin runtime requirement to numpy>=2.0.0rc1 to see which of our optional dependencies still lack compat TST: NumPy 2 compatibility tracker Apr 12, 2024
@neutrinoceros
Copy link
Member Author

neutrinoceros commented Apr 19, 2024

An aspect I hadn't considered earlier is that some packages may have dropped support for Python 3.9 before adding compatibility with numpy 2.0, in which case a brand new env in Python 3.9 may be broken. I'm not going to worry about that right now but it's something to keep in mind if CI breaks specifically on 3.9 even after all our optional dependencies have a numpy2-compatible release.

update 16-05-2024: this was debated on numpy's mailing list so hopefully awareness of this problem is higher now.

@neutrinoceros
Copy link
Member Author

neutrinoceros commented May 16, 2024

My understanding is that netcdf4 maintainers do not wish to build off of a numpy release candidate and are waiting for the final stable release, but the dev branch is already compatible, so I don't expect it to be a blocker for long.

@neutrinoceros
Copy link
Member Author

Ok so I'm seeing progress. It seems that everything we depend on now has ABI-compatible wheels but we're seeing 3 failing tests around astropy. I was already aware of runtime incompatibilities between astropy 6.1.0 and numpy 2.0, and they are fixed on the dev branch. I informed astropy's release managers and a bugfix release should be coming soon.

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Jun 14, 2024

wait I may actually have fabricated this situation by forcing the dependency resolver to look for a version of astropy that

  • has cp39 wheels
  • doesn't pin numpy<2

it's probably not an issue with astropy itself

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Jun 14, 2024

Ok we are down to two failing answer tests https://tests.yt-project.org/job/yt_py310_git/8249/testReport/, both in the enzo frontend.
I would just bump the answers but I don't know if that's actually safe to do here.
Any ideas, @matthewturk @Xarthisius @chrishavlin ?

@chrishavlin
Copy link
Contributor

well this took a while... but for the failed isolated galaxy test with np 2, the grid field values have a dtype of float64 while with np<2 they are float32. here's the test script I was running to look directly at values (rather than hashed values) based on what GridValuesTest does.

import yt  

if __name__== "__main__":
    fld = ('gas', 'velocity_magnitude')
    ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
    
    gvals = []
    dtypes = set()
    for g in ds.index.grids:
        dtypes.add(g[fld].dtype)
        g.clear_data()

    print(dtypes)

With np 2.0.0rc2, that dtypes prints out float64 and np 1.26.4 it is float32.

After a way too long I traced this down to the calculation of relative velocity. The raw velocity components get read in as float32 (specifically, >f4) from the hdf files and then subtract off a float64 bulk velocity. Turns out the following has different behaviors in np >2:

import numpy as np

x = np.ones(5,dtype='>f4') # or dtype='float32'
y = np.ones((3,),dtype='float64')

print((x - y[0]).dtype)

with np 2.0.0rc2, you get a float64 whereas with np1.26.4 you get a float32.

@chrishavlin
Copy link
Contributor

the difference in type promotion is described here: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion :

np.array([3], dtype=np.float32) + np.float64(3) will now return a float64 array.

@chrishavlin
Copy link
Contributor

and the reason why enzo has these fields as >f4 without upcasting is that they're read in f4:

yt/yt/frontends/enzo/io.py

Lines 144 to 145 in e9ba8ac

data = np.empty(dims, dtype=h5_dtype)
yield field, obj, self._read_obj_field(obj, field, (fid, data))

and then the data.copy() in _read_fluid_selection overrides the array type from the initial f8 declaration:

rv[field] = np.empty(size, dtype="=f8")
ind = {field: 0 for field in fields}
for field, obj, data in self.io_iter(chunks, fields):
if data is None:
continue
if isinstance(selector, GridSelector) and field not in nodal_fields:
ind[field] += data.size
rv[field] = data.copy()
else:
ind[field] += obj.select(selector, data, rv[field], ind[field])
return rv

@neutrinoceros
Copy link
Member Author

Thanks a lot for digging into this ! So, since the change is externally controlled and intended, we should just bump then, right ?
I don't think it should be done in this PR but it'll soon become necessary when numpy 2.0 final lands in two days. Does it sound like a good plan to you ?

@chrishavlin
Copy link
Contributor

Thanks a lot for digging into this ! So, since the change is externally controlled and intended, we should just bump then, right ?

yes, I think just bump.

@neutrinoceros
Copy link
Member Author

Cool. I'll open the bump PR as soon as I'm notified numpy 2.0 landed on PyPI !

@chrishavlin
Copy link
Contributor

double checked in #4927 that it was indeed the new type promotion causing the failure (it was). IMO bumping the answers is still the way to go given how it is standard across io methods to cast up to float64. But if there's any reason to not cast, #4927 would be a potential approach (@matthewturk feel free to chime in if you have an opinion here).

@neutrinoceros
Copy link
Member Author

Numpy 2.0 is out and (almost) didn't disrupt our CI. This PR served its purpose. Thanks @chrishavlin for your help !

@neutrinoceros neutrinoceros deleted the exp/numpy2_for_all branch June 22, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge this change should not be merged until later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants