Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we plan to deprecate File-mode-like semantics for accessing arrays and groups? #2466

Open
d-v-b opened this issue Nov 5, 2024 · 8 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Nov 5, 2024

Maybe I'm an outlier, but my Zarr data access falls into three categories, sorted by frequency:

  • I want to read immutable data that already exists at path foo. Two sub-cases:
    • I don't know what exactly is in foo, so I'm happy getting a handle to an array or group
    • I expect an array or group, and getting anything other than what I expect should be an error.
  • I want to create a brand new Zarr array or group at path foo, optionally destroying anything already at foo.
  • I have some data in memory that I need to store in a Zarr array or group at path foo. Two sub-cases:
    • if an array or group compatible my data does not exist at foo, I want to create it and get a mutable handle to the array / group.
    • if an array or group matching my data does exist at foo then there's no need to create anything, and I just want a mutable handle to that array or group.

First, does anyone have a common Zarr access pattern that isn't one of these three?

Second, does anyone think our current top-level API maps on to the access patterns I listed? I personally do not think it does, and I suspect we can get some easy user experience gains by adding more specific and safe functions to the top-level API. See #2463.

Longer term, I think we should consider dropping File-mode-like semantics for our top-level API. maybe open(mode=x) makes sense for files, but I don't think that should be the basis for our user-facing API. We should provide a set of functions that actually match the access patterns users have.

@paraseba
Copy link
Contributor

paraseba commented Nov 5, 2024

I agree with all of this. If anything, I would make the dangerous operations more explicit. Zarr data is usually too large and precious to be destroyed by a single character mistake. For example, I'd be very careful with stuff like

optionally destroying anything already at foo

for cases like that I'd prefer an error plus an explicit destroy operation.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

I agree with all of this. If anything, I would make the dangerous operations more explicit. Zarr data is usually too large and precious to be destroyed by a single character mistake. For example, I'd be very careful with stuff like

optionally destroying anything already at foo

for cases like that I'd prefer an error plus an explicit destroy operation.

Totally agree. The API I was thinking of was create_x(..., overwrite: bool, defaults to False). The default behavior would error if it encounters a precedent object, and the user must explicitly set overwrite to True to delete the extant stuff.

@paraseba
Copy link
Contributor

paraseba commented Nov 5, 2024

I'd go as far as removing the "overwrite" argument, throw and exception and provide a destroy method. Wanting to overwrite is not a very common use case, the extra security is worth having to do a little bit of extra typing.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

Wanting to overwrite is not a very common use case

It is for me! In an iteration phase I often repeatedly create stuff on disk that I'm happy overwriting. Although if safety is the goal then we could support this with a specific function that overwrites, e.g. create_array_with_force, but probably with a better name

@paraseba
Copy link
Contributor

paraseba commented Nov 5, 2024

we could support this with a specific function that overwrites,

I like that. In general, I think boolean arguments are a bit dangerous, boolean blindness and all that. But being explicit in the function name is a good middle ground.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

the semantics of our mode keyword argument has been a source of problems recently -- I wonder if we could solve that by just making a specific function for each mode variant (e.g., read_array replaces mode=r), and then we can remove mode entirely.

@shoyer
Copy link
Contributor

shoyer commented Nov 5, 2024

I'm all for more explicit arguments or functions rather than the mode argument!

I'd go as far as removing the "overwrite" argument, throw and exception and provide a destroy method. Wanting to overwrite is not a very common use case, the extra security is worth having to do a little bit of extra typing.

We use "overwrite" pretty often for cases where a worker might have been pre-empted and left store setup incomplete.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

i'm implementing some of these ideas in #2463, anyone interested should take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants