Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell_widths validation for load_uniform_grid #4328

Merged
merged 9 commits into from
Feb 10, 2023

Conversation

chrishavlin
Copy link
Contributor

Summary

This PR:

  • adds some validation for the cell_widths argument to load_uniform_grid (including dtype checks)
  • adds a missing docstring entry for cell_widths
  • adds some tests (and makes test_stream_stretched.py pytest-only)

Background: dtype bug

This started as a bug fix for when the cell_widths do not have a dtype of float64 and expanded a bit to better capture what cell_widths has to be. The bug can be replicated with:

import yt
import numpy as np

N = 8
data = {"density": np.random.random((N, N, N))}

cell_widths = []
for i in range(3):
    widths = np.random.random(N)
    widths /= widths.sum()  # Normalize to span 0 .. 1.
    cell_widths.append(widths.astype(np.float32)) # <----------- note cell width dtype

ds = yt.load_uniform_grid(
    data,
    [N, N, N],
    bbox=np.array([[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]),
    cell_widths=cell_widths,
)

slc = ds.slice(0, ds.domain_center[0])[("stream", "density")]

I encountered this while loading some data from a netcdf file where the coordinate arrays happened to be in float32. The error that the above code raises is:

ValueError                                Traceback (most recent call last)
Input In [6], in <cell line: 21>()
     12     cell_widths.append(widths.astype(np.float32))
     14 ds = yt.load_uniform_grid(
     15     data,
     16     [N, N, N],
     17     bbox=np.array([[0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]),
     18     cell_widths=cell_widths,
     19 )
---> 21 slc = ds.slice(0, ds.domain_center[0])[("stream", "density")]

File ~/src/yt_/yt_dev/yt/yt/data_objects/data_containers.py:269, in YTDataContainer.__getitem__(self, key)
    267         return self.field_data[f]
    268     else:
--> 269         self.get_data(f)
    270 # fi.units is the unit expression string. We depend on the registry
    271 # hanging off the dataset to define this unit object.
    272 # Note that this is less succinct so that we can account for the case
    273 # when there are, for example, no elements in the object.
    274 try:

File ~/src/yt_/yt_dev/yt/yt/data_objects/selection_objects/data_selection_objects.py:131, in YTSelectionContainer.get_data(self, fields)
    129 def get_data(self, fields=None):
    130     if self._current_chunk is None:
--> 131         self.index._identify_base_chunk(self)
    132     if fields is None:
    133         return

File ~/src/yt_/yt_dev/yt/yt/geometry/grid_geometry_handler.py:351, in GridIndex._identify_base_chunk(self, dobj)
    347 # These next two lines, when uncommented, turn "on" the fast index.
    348 # if dobj._type_name != "grid":
    349 #    fast_index = self._get_grid_tree()
    350 if getattr(dobj, "size", None) is None:
--> 351     dobj.size = self._count_selection(dobj, fast_index=fast_index)
    352 if getattr(dobj, "shape", None) is None:
    353     dobj.shape = (dobj.size,)

File ~/src/yt_/yt_dev/yt/yt/geometry/grid_geometry_handler.py:363, in GridIndex._count_selection(self, dobj, grids, fast_index)
    361 if grids is None:
    362     grids = dobj._chunk_info
--> 363 count = sum(g.count(dobj.selector) for g in grids)
    364 return count

File ~/src/yt_/yt_dev/yt/yt/geometry/grid_geometry_handler.py:363, in <genexpr>(.0)
    361 if grids is None:
    362     grids = dobj._chunk_info
--> 363 count = sum(g.count(dobj.selector) for g in grids)
    364 return count

File ~/src/yt_/yt_dev/yt/yt/data_objects/index_subobjects/grid_patch.py:418, in AMRGridPatch.count(self, selector)
    417 def count(self, selector):
--> 418     mask = self._get_selector_mask(selector)
    419     if mask is None:
    420         return 0

File ~/src/yt_/yt_dev/yt/yt/data_objects/index_subobjects/stretched_grid.py:23, in StretchedGrid._get_selector_mask(self, selector)
     21     mask = self._last_mask
     22 else:
---> 23     mask = selector.fill_mask(self)
     24     if self._cache_mask:
     25         self._last_mask = mask

File yt/geometry/_selection_routines/selector_object.pxi:440, in yt.geometry.selection_routines.SelectorObject.fill_mask()

ValueError: Buffer dtype mismatch, expected 'float64_t' but got 'float'

The bug fix

The fix to the bug is to simply cast the cell_widths to float64 if they are not already. Casting to float64 seems in line with other yt behavior (alternatively I could have caught the case and raised a more meaningful error message than what you get now from deep in the selection routines).

The rest of the PR

In fixing this bug, it seemed a good idea to better encapsulate the requirements for using the cell_widths argument with load_uniform_grid (we are actually missing a docstring entry for cell_widths as well). The summary of those requirements:

  • nprocs must remain 1 (this is hopefully a temporary requirement, but as of now it is required as far as I can tell... you get indexing errors with nproc>1.)
  • you must supply a cell_width array for each dimension (as opposed to a single array that is applied to each dimension)

@chrishavlin chrishavlin added bug code frontends Things related to specific frontends labels Feb 7, 2023
@chrishavlin
Copy link
Contributor Author

oh, whoops, looks like I can't use numpy.typing yet case of the minimum numpy version for yt. I'll look into how to do that better.... but feel free to review in the mean time (or not).

@neutrinoceros
Copy link
Member

There's a discussion to be had on when and how we'll be able to use numpy.typing, but yeah for now the conservative approach is just not to :/

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case where nproc > 1 can be improved, but otherwise my suggestions are mostly stylistic, and I'm +1 for improving the user experience with clear data (in)validation !

yt/frontends/stream/misc.py Outdated Show resolved Hide resolved
yt/frontends/stream/misc.py Outdated Show resolved Hide resolved
yt/frontends/stream/misc.py Outdated Show resolved Hide resolved
yt/frontends/stream/misc.py Outdated Show resolved Hide resolved
yt/frontends/stream/misc.py Outdated Show resolved Hide resolved
yt/loaders.py Outdated Show resolved Hide resolved
chrishavlin and others added 2 commits February 9, 2023 09:29
Co-authored-by: Clément Robert <cr52@protonmail.com>
neutrinoceros
neutrinoceros previously approved these changes Feb 9, 2023
tests/tests.yaml Outdated Show resolved Hide resolved
Co-authored-by: Kacper Kowalik <xarthisius.kk@gmail.com>
@matthewturk matthewturk merged commit f344805 into yt-project:main Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug code frontends Things related to specific frontends
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants