Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: require numpy 2.0.0 (stable) at build time #4930

Merged

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Jun 16, 2024

PR Summary

Numpy 2.0 just dropped on PyPI (🎉). This change is a pretext to make sure everything still runs smoothly. From the conversation on #4874, I expect I'll need to bump a couple answer tests but everything else I expect to stay green.

Closes #4874

@neutrinoceros neutrinoceros added this to the 4.4.0 milestone Jun 16, 2024
@neutrinoceros neutrinoceros added build related to the build process bug tests: running tests Issues with the test setup labels Jun 16, 2024
@neutrinoceros
Copy link
Member Author

I'm puzzled. I don't understand how it's possible that the one of the test is still failing after bumping.

@matthewturk
Copy link
Member

Is it possible there's a NaN? I'm not sure how NaNs would impact the typical equality/hash comparisons.

@neutrinoceros
Copy link
Member Author

Good thinking ! indeed, hashing an array isn't a stable operation if it contains a NaN

In [8]: a = np.arange(10, dtype="float")

In [9]: a
Out[9]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [10]: a[0] = np.nan

In [11]: hash(tuple(a))
Out[11]: -5529612596851953840

In [12]: hash(tuple(a))
Out[12]: -5371981722103082231

In [13]: hash(tuple(a))
Out[13]: -7440010721638526525

In [14]: hash(tuple(a))
Out[14]: -4675353377044086749

In [15]: hash(tuple(a))
Out[15]: 1634418349654970603

In [16]: hash(tuple(a))
Out[16]: 5494725379304409846

In [17]: a = np.arange(10, dtype="float")

In [18]: hash(tuple(a))
Out[18]: -2040549277248155741

In [19]: hash(tuple(a))
Out[19]: -2040549277248155741

In [20]: hash(tuple(a))
Out[20]: -2040549277248155741

In [21]: hash(tuple(a))
Out[21]: -2040549277248155741

In [22]: hash(tuple(a))
Out[22]: -2040549277248155741

In [23]: hash(tuple(a))
Out[23]: -2040549277248155741

(I should point out that this little experiment was conducted with numpy 1.26)
So maybe what's happening is that we have some NaN in an output field that wasn't present when running with numpy 1.x 🤔
Could be a bug on our side or a in NumPy (or both ?? 🙀)
@chrishavlin, any chance you still have whatever scripts you used to decipher this situation a couple days ago ?

@matthewturk
Copy link
Member

I really need to learn more about NaNs. I had no idea there was, officially, a 'payload': https://en.wikipedia.org/wiki/NaN

@chrishavlin
Copy link
Contributor

oh interesting, it's not even the same tests that are failing -- still enzo, but the velocity_divergence field rather than the velocity_magnitude. Likely a very similar issue though, I'll take a closer look this morning.

@neutrinoceros
Copy link
Member Author

Awesome, thank you very much. Feel free to push to this branch if you need to !

@chrishavlin
Copy link
Contributor

chrishavlin commented Jun 17, 2024

well I'm thoroughly confused. Manually comparing values didn't show any difference in grid values for ('gas', 'velocity_divergence') between np 2 and np 1.26.4. Then I spent a while getting nose working with py 3.10 to match what Jenkins is doing.... and I reproduced the error with just np2:

in an environment with np2, store the answer tests for enzo:

nosetests --answer-store --local --with-answer-testing --local-dir /Users/chavlin/data/yt_answer_store/enzo_test yt/frontends/enzo/tests/test_outputs.py  --answer-name=local-ezno yt.frontends.enzo --answer-big-data --verbose

and then immediately run a comparison in the same np2 environment

nosetests --local --with-answer-testing --local-dir /Users/chavlin/data/yt_answer_store/enzo_test yt/frontends/enzo/tests/test_outputs.py  --answer-name=local-ezno yt.frontends.enzo --answer-big-data --verbose

and that test fails for me:


======================================================================
FAIL: GridValues_galaxy0030_all_('gas', 'velocity_divergence')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chavlin/.pyenv/versions/3.10.12/envs/yt_np2_delete/lib/python3.10/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/chavlin/src/yt_/yt_dev/yt/yt/utilities/answer_testing/framework.py", line 418, in __call__
    self.compare(nv, ov)
  File "/Users/chavlin/src/yt_/yt_dev/yt/yt/utilities/answer_testing/framework.py", line 700, in compare
    assert_equal(new_result[k], old_result[k])
  File "/Users/chavlin/.pyenv/versions/3.10.12/envs/yt_np2_delete/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 414, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 'ec255050b61b671af5425c0e4ae7a17d'
 DESIRED: 'f16afe8932b49e9b2acadf304ae3a168'

But this is only happening within nose -- I made a little manual answer test and couldn't reproduce the failure (script is here). Not even sure where to look now...

EDIT: on my desktop (Ubuntu), I don't have the above issue -- storing the enzo tests then immediately re-running passes as expected.

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Jun 17, 2024

Interesting that you get

 ACTUAL: 'ec255050b61b671af5425c0e4ae7a17d'
 DESIRED: 'f16afe8932b49e9b2acadf304ae3a168'

while the latest attempt on Jenkins has

 ACTUAL: 'f16afe8932b49e9b2acadf304ae3a168'
 DESIRED: 'ec255050b61b671af5425c0e4ae7a17d'

So the hash isn't stable but it's not completely random either.

On my desktop (Ubuntu), I don't have the above issue -- storing the enzo tests then immediately re-running passes as expected.

Just curious, how many attempts did you make ? Was it one (un)lucky try ?

@matthewturk
Copy link
Member

Maybe there's an unstable ordering issue -- is it possible argsort could be involved?

@neutrinoceros
Copy link
Member Author

wait sorting can be unstable ??? in what sense ?

@matthewturk
Copy link
Member

So I'm speculating. :) What I'm wondering is if argsort could potentially have an instability. I suspect that I'm way off in the wild on this, though. My concern was what if:

indices = np.argsort(some_array_with_duplicates)

and then sorting by those indices could potentially result in different orderings. But I'm kind of grasping at straws here and I don't think it's the source of the error.

I will note that we've struggled with divergence in the past -- it has been the source of differences (but reliable, not stochastic) as a result of order-of-operations, subtraction of tiny numbers from single precision, etc.

@neutrinoceros
Copy link
Member Author

Oh I missed that you were talking about *arg*sort, which indeed may be unstable, if the fact that it accepts an optional keyword argument kind="stable" is any indication

@matthewturk
Copy link
Member

oh, and stable is new in 2.0.0!

@matthewturk
Copy link
Member

Ah, but while I thought we used argsort in testing, turns out we kinda don't.

$ grep -lr argsort --include="*.py"
yt/visualization/volume_rendering/create_spline.py
yt/visualization/plot_modifications.py
yt/fields/tests/test_fields.py
yt/frontends/tipsy/io.py
yt/frontends/stream/data_structures.py
yt/frontends/stream/definitions.py
yt/frontends/art/io.py
yt/frontends/gadget/io.py
yt/frontends/flash/data_structures.py
yt/data_objects/selection_objects/ray.py
yt/data_objects/level_sets/tests/test_clump_finding.py
yt/data_objects/level_sets/clump_tools.py
yt/data_objects/particle_trajectories.py
yt/geometry/coordinates/cartesian_coordinates.py
yt/utilities/particle_generator.py
yt/utilities/lib/tests/test_geometry_utils.py
yt/utilities/lib/cykdtree/tests/test_utils.py
yt/utilities/flagging_methods.py

@neutrinoceros
Copy link
Member Author

oh, and stable is new in 2.0.0!

is it ? the docs says

Changed in version 1.15.0.: The ‘stable’ option was added.

@matthewturk
Copy link
Member

Right -- kind="stable" was in 1.15.0, but stable: bool was added in 2.0.0. (The two are not unrelated though :) )

@neutrinoceros
Copy link
Member Author

Ooooow. To think I actually worked on this a couple months back and apparently obliterated it from my memory...

@chrishavlin
Copy link
Contributor

chrishavlin commented Jun 18, 2024

well i don't have an answer but have found some suspicious behavior. I'm going to summarize it all here

how to reproduce failure

install yt, nose

Working from a fresh py3.10 environment:

pip install -e .[enzo,test]

For running nose with python 3.10, you need to edit nose a bit. The jenkins build does the following:

find .venv/lib/python3.10/site-packages/nose -name '*.py' -exec sed -i -e s/collections.Callable/collections.abc.Callable/g '{}' ';'

on a mac and with a pyenv-virtualenv called yt_np2_test_again i used:

find /Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/nose -name "*.py" -exec sed -i '' s/collections.Callable/collections.abc.Callable/g {} +

running the tests

In an effort to match more closely what jenkins is doing, I hacked away at tests/nose_runner.py and tests/tests.yaml so that I could call python tests/nose_runner.py and have it just run the suspect enzo test (code on this branch). My initial comment above (this comment) actually has a mistake in how nosetests is called: it included both a path to the enzo test file and the module specification, so it was running tests twice (turns out I think it was exposing the same or a very similar error though -- see below).

the failure

Here's the output of running python tests/nose_runner.py (the hacked version that will just call the suspect enzo test):

(yt_np2_test_again) chavlin@lism01-chavlin yt % python tests/nose_runner.py               
NoseWorker-1: WILL DO self.name = py310_local_enzo_010
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 58.996s

OK
Updating and precise version information requires 
gitpython to be installed.
Try: python -m pip install gitpython
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... FAIL
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

======================================================================
FAIL: GridValues_galaxy0030_all_('gas', 'velocity_divergence')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/chavlin/src/yt_/yt_dev/np2_testing/yt/yt/utilities/answer_testing/framework.py", line 418, in __call__
    self.compare(nv, ov)
  File "/Users/chavlin/src/yt_/yt_dev/np2_testing/yt/yt/utilities/answer_testing/framework.py", line 700, in compare
    assert_equal(new_result[k], old_result[k])
  File "/Users/chavlin/.pyenv/versions/yt_np2_test_again/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 414, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: 'f16afe8932b49e9b2acadf304ae3a168'
 DESIRED: 'ec255050b61b671af5425c0e4ae7a17d'

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 60.008s

FAILED (failures=1)
NoseWorker-1: Exiting

Initially the answer store does not exist -- it runs the tests and stores them. Then it immediately runs the tests again and fails with the exact same ACTUAL/DESIRED hashes in the failing test here.

getting it to pass?

run again

If I immediately run tests again, with the answer store already existing, tests pass:

(yt_np2_test_again) chavlin@lism01-chavlin yt % python tests/nose_runner.py
NoseWorker-1: WILL DO self.name = py310_local_enzo_010
Updating and precise version information requires 
gitpython to be installed.
Try: python -m pip install gitpython
GridHierarchy_galaxy0030_all ... ok
ParentageRelationships_galaxy0030_all ... ok
GridValues_galaxy0030_all_('gas', 'velocity_divergence') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_0_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_1_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_all_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_None ... ok
PixelizedProjectionValues_galaxy0030_sphere_('max', (0_1, 'unitary'))_('gas', 'velocity_divergence')_2_('gas', 'density') ... ok

----------------------------------------------------------------------
XML: /Users/chavlin/src/yt_/yt_dev/np2_testing/yt/nosetests.xml
----------------------------------------------------------------------
Ran 15 tests in 58.911s

OK

dataset state?

In trying to find the actual cause of the error, I came across a couple of ways to get the test to pass that are likely fixing a symptom and not the cause...

First, test_galaxy0030 instantiates a ds that gets passed to the big_patch_amr function:

@requires_module("h5py")
@requires_ds(g30, big_data=True)
def test_galaxy0030():
ds = data_dir_load(g30)
yield check_color_conservation(ds)
assert_equal(str(ds), "galaxy0030")
for test in big_patch_amr(ds, _fields):
test_galaxy0030.__name__ = test.description
yield test
assert_equal(ds.particle_type_counts, {"io": 1124453})

If I edit that to instead pass the filename of the dataset, then tests pass initially.

Second, the actual calculation of the divergence is very sensitive to small changes in the value of grid-spacing. Here's the relevant code:

def _divergence(field, data):
ds = div_fac * just_one(data["index", "dx"])
f = data[xn[0], f"relative_{xn[1]}"][sl_right, 1:-1, 1:-1] / ds
f -= data[xn[0], f"relative_{xn[1]}"][sl_left, 1:-1, 1:-1] / ds
ds = div_fac * just_one(data["index", "dy"])
f += data[yn[0], f"relative_{yn[1]}"][1:-1, sl_right, 1:-1] / ds
f -= data[yn[0], f"relative_{yn[1]}"][1:-1, sl_left, 1:-1] / ds
ds = div_fac * just_one(data["index", "dz"])
f += data[zn[0], f"relative_{zn[1]}"][1:-1, 1:-1, sl_right] / ds
f -= data[zn[0], f"relative_{zn[1]}"][1:-1, 1:-1, sl_left] / ds
new_field = data.ds.arr(np.zeros(data[xn].shape, dtype=np.float64), f.units)
new_field[1:-1, 1:-1, 1:-1] = f
return new_field

If I change those ds lines to convert to cm (e.g., ds = div_fac * just_one(data["index", "dx"]).to('cm')) then tests pass initially.

@chrishavlin
Copy link
Contributor

no update on a fix, but looks like the failure isn't np 2 related. I think this behavior has been around a while and it likely is only being exposed now because of bumping the answers: here's a test script

import yt
import numpy as np

yt.set_log_level(50)

def _get_vals():
    ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
    g1 = ds.index.grids[0]
    vdiv_1 = g1['gas', 'velocity_divergence']
    return vdiv_1

if __name__ == "__main__":
    print(np.version.version)
    v1 = _get_vals()
    v2 = _get_vals()
    print(np.max(np.abs(v1 - v2))) # 4.930380657631324e-32 1/s
    print(np.all(v1==v2)) # False
    v3 = _get_vals()
    print(np.all(v1==v3)) # False
    print(np.all(v2==v3)) # True

The script above behaves the same for np>2 and np<2 (including when building yt with np<2 -- I went back as far as yt-4.1.4).

@neutrinoceros
Copy link
Member Author

neutrinoceros commented Jun 21, 2024

So, yt 4.1.4 was in January 2023, but #38561 was first released in yt 4.1.0. Maybe that's actually a regression that happened somewhere in between ?

Footnotes

  1. last time these answers were bumped successfully

@chrishavlin
Copy link
Contributor

Ok -- kinda figured out why this is happening, but still not certain of the root cause:

So the issue is related to unit registry state. Within the _divergence calculation:

def _divergence(field, data):
ds = div_fac * just_one(data["index", "dx"])
f = data[xn[0], f"relative_{xn[1]}"][sl_right, 1:-1, 1:-1] / ds
f -= data[xn[0], f"relative_{xn[1]}"][sl_left, 1:-1, 1:-1] / ds
ds = div_fac * just_one(data["index", "dy"])
f += data[yn[0], f"relative_{yn[1]}"][1:-1, sl_right, 1:-1] / ds
f -= data[yn[0], f"relative_{yn[1]}"][1:-1, sl_left, 1:-1] / ds
ds = div_fac * just_one(data["index", "dz"])
f += data[zn[0], f"relative_{zn[1]}"][1:-1, 1:-1, sl_right] / ds
f -= data[zn[0], f"relative_{zn[1]}"][1:-1, 1:-1, sl_left] / ds
new_field = data.ds.arr(np.zeros(data[xn].shape, dtype=np.float64), f.units)
new_field[1:-1, 1:-1, 1:-1] = f
return new_field

Those ds = div_fac * just_one(data["index", "dx"]) lines end up with the unit registry of the previous dataset.

Here's a standalone script that illustrates the bug:

import yt

yt.set_log_level(50)

print("load ds 1\n")
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
print(f"Initial registry id on load 1: {id(ds.unit_registry)}") # 139759425784224
dx = 1. * ds.index.grids[0]['index', 'dx'][0,0,0]
print(f"dx registry: {id(dx.units.registry)}") # 139759425784224

print("\nload ds 2\n")
ds2 = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030") 
print(f"Initial registry id on load 2: {id(ds2.unit_registry)}") # 140563377278064

dx0 = ds2.index.grids[0]['index', 'dx'][0,0,0]
print(f"dx0 registry: {id(dx0.units.registry)}") # 140563377278064

dx = 1. * dx0
print(f"dx registry: {id(dx.units.registry)}") # 139759425784224

for which you get

load ds 1

Initial registry id on load 1: 140180192729504
Parsing Hierarchy : 100%|███████████████████████████████████████████████████████████████████████████████| 173/173 [00:00<00:00, 23574.99it/s]
dx registry: 140180192729504

load ds 2

Initial registry id on load 2: 140180163540928
Parsing Hierarchy : 100%|███████████████████████████████████████████████████████████████████████████████| 173/173 [00:00<00:00, 25036.73it/s]
dx0 registry: 140180163540928
dx registry: 140180192729504

That the unit registry of the final dx variable points to the registry of the previous dataset.

If I comment all the @lru_cache decorators in unyt/array.py then the that final dx points to its proper registry and the subsequent hashes in the failing nose tests will match. So for some reason unyt is picking up the registry from the prior dataset. But it's not immediately clear why that would matter: in stepping through the code, I've found that the following spot results in a change in hash despite being a conversion from '1/s' to '1/s':

Adding the following will print out the conversion factor that ends up being used

                        if "velocity_divergence" in str(field):
                            print("_generate_fields")
                            print(id(fd.units.registry))
                            from unyt.array import _sanitize_units_convert
                            new_units = _sanitize_units_convert(fi.units, fd.units.registry)
                            (conv_factor, offset) = fd.units.get_conversion_factor(
                                new_units, fd.dtype
                            )
                            print(f"conversion factor: {conv_factor}")

when you access

import yt
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
g1 = ds.index.grids[0]['gas', 'velocity_divergence']

it hits this line twice, the first time through the conversion factor is not exactly 1:

_generate_fields
140347495891200
conversion factor: 1.0000000000000002

the second time through it is exactly one:

_generate_fields
140347495891200
conversion factor: 1.0

and with the unit registry state issue, when you open the dataset again, the initial conversion factor is identically 1.0, so the final hash differs between the two cases.

a quick fix?

turns out that you change this line:

new_field = data.ds.arr(np.zeros(data[xn].shape, dtype=np.float64), f.units)

to pass in the str repr of the unit:

new_field = data.ds.arr(np.zeros(data[xn].shape, dtype=np.float64), str(f.units))

the unit-registry state issue is fixed (because it forces unyt will pull in the correct registry associated with data.ds rather than f?), so maybe we should just do that and be done?

@neutrinoceros
Copy link
Member Author

I've found that the following spot results in a change in hash despite being a conversion from '1/s' to '1/s'

If the str '1/s' is (or more specifically, its id) involved in the hash computation I would expect a change: for short strings that only contain alphanumeric (and _) characters, Python only stores one copy of the string, but / isn't part of thes, so two instances of '1/2' while have different ids.
Not sure that's relevant though.

But your fix seems safe anyway, so let's !
Thanks again for this very impressive inquiry !

@chrishavlin
Copy link
Contributor

If the str '1/s' is (or more specifically, its id) involved in the hash computation I would expect a change: for short strings that only contain alphanumeric (and _) characters, Python only stores one copy of the string, but / isn't part of thes, so two instances of '1/2' while have different ids.

the hashes were just on based the array values though -- there are actual differences in the arrays. Here's a simpler example that illustrates the problem:

import yt

yt.set_log_level(50)

ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
dx = ds.index.grids[0]['index', 'dx'][0,0,0]
x1 = ds.index.grids[0]['gas', 'relative_velocity_x'][0,0,0]
x2 = x1 / dx
x2c = x2.to(str(x2.units))
print(x2c.units, x2.units, x2c.units == x2.units)
print(x2c - x2)

shows

1/s 1/s True
-5.877471754111438e-39 1/s

in any case, the quick fix is probably the way to go -- i have to hop off right now though, may not get to it before next week.

@neutrinoceros
Copy link
Member Author

there are actual differences in the arrays

ooow, I completely missed that. Yes, I think your fix is the way to go, so I'll push it now (with you as the commit author).

@neutrinoceros
Copy link
Member Author

just pushed it now. I wasn't sure how to phrase the commit message so feel free to amend it.
I'll come back in about an hour to bump the answers again if needed, otherwise I'll just undraft the PR.

@neutrinoceros neutrinoceros force-pushed the bld/build_against_stable_numpy branch from a55609a to 73a1672 Compare June 22, 2024 08:55
@neutrinoceros
Copy link
Member Author

Ok Jenkins is back to green, but just to be sure, let's double check that it's really stable : @yt-fido test this please

@neutrinoceros neutrinoceros merged commit 9634be1 into yt-project:main Jun 25, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug build related to the build process tests: running tests Issues with the test setup
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants