Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing with a boolean dask array is not allowed... #332

Closed
Mikejmnez opened this issue Mar 24, 2023 · 5 comments · Fixed by #335
Closed

Indexing with a boolean dask array is not allowed... #332

Mikejmnez opened this issue Mar 24, 2023 · 5 comments · Fixed by #335
Assignees

Comments

@Mikejmnez
Copy link
Collaborator

  • OceanSpy version:
    newest

Description

xarray version 2023.3.

Simplest example to reproduce the error: Live demo notebook in binder:

import oceanspy as ospy
import xarray as xr
od = ospy.open_oceandataset.from_zarr("OSM2020_EGshelfIIseas2km_ERAI_1D")  # data stored in binder

# define mooring
lats_Kogur = [68.68, 67.52, 66.49]
lons_Kogur = [-26.28, -23.77, -22.99]

# Extract mooring array
od_moor = od.subsample.mooring_array(Xmoor=lons_Kogur, Ymoor=lats_Kogur)

The last line fails to run. The (relevant) traceback:

File /srv/conda/envs/notebook/lib/python3.10/site-packages/oceanspy/subsample.py:399, in cutout(od, varList, YRange, XRange, add_Hbdr, mask_outside, ZRange, add_Vbdr, timeRange, timeFreq, sampMethod, dropAxes, centered, chunks, persist, geo_true)
    397 if XRange is not None or YRange is not None:
    398     XRange, ref_lon = _reset_range(XRange)
--> 399     maskH, dmaskH, XRange, YRange = get_maskH(
    400         ds, add_Hbdr, XRange, YRange, ref_lon=ref_lon
    401     )
    403     dYp1 = dmaskH["Yp1"].values
    404     dXp1 = dmaskH["Xp1"].values

File /srv/conda/envs/notebook/lib/python3.10/site-packages/oceanspy/utils.py:786, in get_maskH(ds, add_Hbdr, XRange, YRange, ref_lon)
    782 # Find horizontal indexes
    783 maskH = maskH.assign_coords(
    784     Yp1=_np.arange(len(maskH["Yp1"])), Xp1=_np.arange(len(maskH["Xp1"]))
    785 )
--> 786 dmaskH = maskH.where(maskH, drop=True)
    787 return maskH, dmaskH, XRange, YRange

the error being:

KeyError: 'Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray.Please compute the indexer first using .compute()'

Quick fix for now,

pin xarray to version <= 2023.2.

This might be a good opportunity to see if we can start using xoak to find indexes instead .

@Mikejmnez Mikejmnez self-assigned this Mar 24, 2023
@Mikejmnez
Copy link
Collaborator Author

FYI the current Oceanography image on Sciserver has xarray = 2023.2, for which there is no trouble.

@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Mar 24, 2023

More context on the erro (see utils.py lines 767 + ):

maskH = _xr.ones_like(ds["YG"])

maskH = maskH.where( _np.logical_and(ds["YG"] >= YRange[0],  ds["YG"] <= YRange[-1]), 0)

maskH = maskH.assign_coords(Yp1=_np.arange(len(maskH["Yp1"])), Xp1=_np.arange(len(maskH["Xp1"])))

dmaskH = maskH.where(maskH, drop=True)

The last line is being flagged as error in xarray 2023.3. In order to have tests passing, we have to:

dmaskH = maskH.where(maskH.compute(), drop=True)

@Mikejmnez Mikejmnez mentioned this issue Mar 26, 2023
@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Mar 26, 2023

PR #334 introduces a workaround to this issue.

I am keeping this issue open, as we need to investigate whether mask.compute() is a good idea or not in terms of performance (perhaps there isn't much loss). Or, perhaps we need to rethink how to mask the coordinates.

@malmans2
Copy link
Contributor

I think compute is OK here. drop=True silently does compute (or at least used to), as it needs to evaluate which data can be dropped.

@Mikejmnez
Copy link
Collaborator Author

That is reassuring. I was a little bit worried that .compute() would be costly in terms of performance for grids like LLC4320 (in the case a cutout region is close to the entire domain). I am looking into this. It may lead to O(1-3) minutes at worst (although that may depend on the current Sciserver usage). More on this to come

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants