Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize mode / safety guarantees for Zarr #8454

Open
max-sixty opened this issue Nov 15, 2023 · 1 comment
Open

Formalize mode / safety guarantees for Zarr #8454

max-sixty opened this issue Nov 15, 2023 · 1 comment
Labels
topic-zarr Related to zarr storage library

Comments

@max-sixty
Copy link
Collaborator

max-sixty commented Nov 15, 2023

What is your issue?

It sounds like we're coalescing on when it's safe to write concurrently:

  • mode="r+" is safe to write concurrently to different parts of a dataset
  • mode="a" isn't safe, because it changes the shape of an array, for example extending a dimension

What are the existing operations that aren't consistent with this?

  • Is concurrently writing additional variables safe? Or it requires updating the centralized consolidated metadata? Currently that requires mode="a", which is overly conservative based on the above rules assuming it is safe — we can liberalize to allow with mode="r+".
  • Writing to regions with unaligned chunks can lose data #8371, but that's a bug — edit: or possibly an artifact of writing concurrently to overlapping chunks with a single to_zarr call. We could at least restrict non-aligned writes to mode="a", so it wasn't possible to hit this mistakenly while writing to different parts of a dataset.
  • Writing the same values to the same chunks concurrently isn't safe at the moment — we'll get an "Stale file handle" error if two processes write to the same location at the same time. I'm not sure if that's possible to allow; possibly it requires work on the Zarr side. If it were possible, we wouldn't have to be as careful about ensuring that each process has mutually exclusive chunks to write. (lower priority)
@max-sixty max-sixty added the topic-zarr Related to zarr storage library label Nov 15, 2023
@dcherian
Copy link
Contributor

Writing the same values to the same chunks concurrently isn't safe at the moment

Zarr seems to have some options here: https://zarr.readthedocs.io/en/stable/api/sync.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

2 participants