Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing nested children via consolidated metadata fails #2358

Closed
jhamman opened this issue Oct 14, 2024 · 1 comment · Fixed by #2363
Closed

Accessing nested children via consolidated metadata fails #2358

jhamman opened this issue Oct 14, 2024 · 1 comment · Fixed by #2363
Labels
bug Potential issues with the zarr-python library
Milestone

Comments

@jhamman
Copy link
Member

jhamman commented Oct 14, 2024

Zarr version

3.0.0.beta

Numcodecs version

0.13

Python Version

3.11

Operating System

Mac

Installation

pip

Description

In pydata/xarray#9552, I noticed that accessing nested children fails when using consolidated metadata.

Steps to reproduce

import zarr

store = zarr.storage.MemoryStore(mode='w')

# create hierarchy root + foo/bar
root = zarr.open_group(store=store, attributes={'a': 'b'}, mode='w')
root.create_array('foo/bar', shape=(2, 2), attributes={'d': 4})

# consolidate metadata
out = zarr.consolidate_metadata(store)

out['foo/bar']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:670](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=669), in AsyncGroup._getitem_consolidated(self, store_path, key, prefix)
    669 try:
--> 670     metadata = self.metadata.consolidated_metadata.metadata[key]
    671 except KeyError as e:
    672     # The Group Metadata has consolidated metadata, but the key
    673     # isn't present. We trust this to mean that the key isn't in
    674     # the hierarchy, and *don't* fall back to checking the store.

KeyError: 'foo[/bar](http://localhost:8888/bar)'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[20], line 12
      9 # consolidate metadata
     10 out = zarr.consolidate_metadata(store)
---> 12 out['foo[/bar](http://localhost:8888/bar)']

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:1330](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=1329), in Group.__getitem__(self, path)
   1329 def __getitem__(self, path: str) -> Array | Group:
-> 1330     obj = self._sync(self._async_group.getitem(path))
   1331     if isinstance(obj, AsyncArray):
   1332         return Array(obj)

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:185](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=184), in SyncMixin._sync(self, coroutine)
    182 def _sync(self, coroutine: Coroutine[Any, Any, T]) -> T:
    183     # TODO: refactor this to to take *args and **kwargs and pass those to the method
    184     # this should allow us to better type the sync wrapper
--> 185     return sync(
    186         coroutine,
    187         timeout=config.get("async.timeout"),
    188     )

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:141](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=140), in sync(coro, loop, timeout)
    138 return_result = next(iter(finished)).result()
    140 if isinstance(return_result, BaseException):
--> 141     raise return_result
    142 else:
    143     return return_result

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py:100](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/sync.py#line=99), in _runner(coro)
     95 """
     96 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     97 exception, the exception will be returned.
     98 """
     99 try:
--> 100     return await coro
    101 except Exception as ex:
    102     return ex

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:608](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=607), in AsyncGroup.getitem(self, key)
    606 # Consolidated metadata lets us avoid some I[/O](http://localhost:8888/O) operations so try that first.
    607 if self.metadata.consolidated_metadata is not None:
--> 608     return self._getitem_consolidated(store_path, key, prefix=self.name)
    610 # Note:
    611 # in zarr-python v2, we first check if `key` references an Array, else if `key` references
    612 # a group,using standalone `contains_array` and `contains_group` functions. These functions
    613 # are reusable, but for v3 they would perform redundant I[/O](http://localhost:8888/O) operations.
    614 # Not clear how much of that strategy we want to keep here.
    615 elif self.metadata.zarr_format == 3:

File [~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py:676](http://localhost:8888/lab/tree/~/Library/CloudStorage/Dropbox/src/zarr-python/src/zarr/core/group.py#line=675), in AsyncGroup._getitem_consolidated(self, store_path, key, prefix)
    671 except KeyError as e:
    672     # The Group Metadata has consolidated metadata, but the key
    673     # isn't present. We trust this to mean that the key isn't in
    674     # the hierarchy, and *don't* fall back to checking the store.
    675     msg = f"'{key}' not found in consolidated metadata."
--> 676     raise KeyError(msg) from e
    678 # update store_path to ensure that AsyncArray[/Group.name](http://localhost:8888/Group.name) is correct
    679 if prefix != "[/](http://localhost:8888/)":

KeyError: "'foo[/bar](http://localhost:8888/bar)' not found in consolidated metadata."

Additional output

No response

@jhamman jhamman added the bug Potential issues with the zarr-python library label Oct 14, 2024
@jhamman jhamman added this to the 3.0.0 milestone Oct 14, 2024
@TomAugspurger
Copy link
Contributor

Oh, I didn't know that was valid. I'll push a fix up today.

TomAugspurger added a commit to TomAugspurger/zarr-python that referenced this issue Oct 14, 2024
This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where `Group.getitem` returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have *it* fall back.

Closes zarr-developers#2358
TomAugspurger added a commit to TomAugspurger/zarr-python that referenced this issue Oct 14, 2024
This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently. This is fine for consolidated
metadata which doesn't need to do IO.

There's a complication around unconsolidated metadata, though. What
if we encounter a node where `Group.getitem` returns a sub Group
without consolidated metadata. Then we need to fall back to
non-consolidated metadata. We've written _getitem_consolidated
as a regular (non-async) function so we need to pop back up to
the async caller and have *it* fall back.

Closes zarr-developers#2358
TomAugspurger added a commit that referenced this issue Oct 17, 2024
* Fixed consolidated Group getitem with multi-part key

This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently.

Closes #2358

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>
d-v-b pushed a commit to d-v-b/zarr-python that referenced this issue Oct 18, 2024
…#2363)

* Fixed consolidated Group getitem with multi-part key

This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently.

Closes zarr-developers#2358

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>
d-v-b added a commit that referenced this issue Oct 18, 2024
* move v3/tests to tests and fix various mypy issues

* test(ci): change branch name in v3 workflows (#2368)

* Use lazy % formatting in logging functions (#2366)

* Use lazy % formatting in logging functions

* f-string should be more efficient

* Space before unit symbol

From "SI Unit rules and style conventions":
https://physics.nist.gov/cuu/Units/checklist.html

	There is a space between the numerical value and unit symbol,
	even when the value is used in an adjectival sense, except in
	the case of superscript units for plane angle.

* Enforce ruff/flake8-logging-format rules (G)

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Move roadmap and v3-design documument to docs (#2354)

* move roadmap to docs

* formatting and minor copy editing

* Multiple imports for an import name (#2367)

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Enforce ruff/pycodestyle warnings (W) (#2369)

* Apply ruff/pycodestyle rule W291

W291 Trailing whitespace

* Enforce ruff/pycodestyle warnings (W)

It looks like `ruff format` does not catch all trailing spaces.

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Apply ruff/pycodestyle preview rule E262 (#2370)

E262 Inline comment should start with `# `

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Fix typo (#2382)

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Imported name is not used anywhere in the module (#2379)

* Missing mandatory keyword argument `shape` (#2376)

* Update ruff rules to ignore (#2374)

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Docstrings for arraymodule (#2276)

* start to docstrings for arraymodule

* incorporating toms edits, overriding mypy error...

* fix attrs

* Update src/zarr/core/array.py

Co-authored-by: Sanket Verma <svsanketverma5@gmail.com>

* fix store -> storage

* remove properties from asyncarray docstring

---------

Co-authored-by: Sanket Verma <svsanketverma5@gmail.com>
Co-authored-by: Joe Hamman <joe@earthmover.io>

* fix/normalize storage paths (#2384)

* bring in path normalization function from v2, and add a failing test

* rephrase comment

* simplify storepath creation

* Update tests/v3/test_api.py

Co-authored-by: Joe Hamman <joe@earthmover.io>

* refactor: remove redundant zarr format fixture

* replace assertion with an informative error message

* fix incorrect path concatenation in make_store_path, and refactor store_path tests

* remove upath import because we don't need it

* apply suggestions from code review

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Enforce ruff/flake8-pyi rule PYI013 (#2389)

PYI013 Non-empty class body must not contain `...`

Note that documentation is enough to fill the class body.

* deps: remove fasteners from list of dependencies (#2386)

* Enforce ruff/flake8-annotations rule ANN003 (#2388)

ANN003 Missing type annotation

Co-authored-by: Joe Hamman <joe@earthmover.io>

* Enforce ruff/Perflint rules (PERF) (#2372)

* Apply ruff/Perflint rule PERF401

PERF401 Use a list comprehension to create a transformed list

* Enforce ruff/Perflint rules (PERF)

* chore: update package maintainers (#2387)

* chore: update package maintainers

* Update pyproject.toml

Co-authored-by: David Stansby <dstansby@gmail.com>

---------

Co-authored-by: David Stansby <dstansby@gmail.com>

* Fixed consolidated Group getitem with multi-part key (#2363)

* Fixed consolidated Group getitem with multi-part key

This fixes `Group.__getitem__` when indexing with a key
like 'subgroup/array'. The basic idea is to rewrite the indexing
operation as `group['subgroup']['array']` by splitting the key
and doing each operation independently.

Closes #2358

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>

* chore: add python 3.13 to ci / pyproject.toml (#2385)

* chore: add python 3.13 to ci / pyproject.toml

* update hatch matrix

* remove references to dead test dir in pyproject.toml

* remove v3 reference in test

---------

Co-authored-by: Joe Hamman <joe@earthmover.io>
Co-authored-by: Dimitri Papadopoulos Orfanos <3234522+DimitriPapadopoulos@users.noreply.github.com>
Co-authored-by: Emma Marshall <55526386+e-marshall@users.noreply.github.com>
Co-authored-by: Sanket Verma <svsanketverma5@gmail.com>
Co-authored-by: David Stansby <dstansby@gmail.com>
Co-authored-by: Tom Augspurger <tom.w.augspurger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants