Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate open_zarr #7496

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

deprecate open_zarr #7496

wants to merge 5 commits into from

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Jan 31, 2023

This PR deprecates open_zarr in favor of open_dataset(..., engine='zarr').

@github-actions github-actions bot added io topic-backends topic-zarr Related to zarr storage library labels Jan 31, 2023
@jhamman jhamman requested a review from rabernat January 31, 2023 16:47
xarray/backends/zarr.py Outdated Show resolved Hide resolved
@Illviljan
Copy link
Contributor

It would be nice to have a good copy/paste example for open_dataset. For example I think chunks have different defaults compared to open_zarr.

@jhamman
Copy link
Member Author

jhamman commented Jan 31, 2023

@paigem recently added the following to the open_dataset docstring (#7438):

In order to reproduce the default behavior of xr.open_zarr(...) use 
`xr.open_dataset(..., engine='zarr', chunks={})`.

@jhamman jhamman marked this pull request as ready for review January 31, 2023 21:17
xarray/backends/zarr.py Outdated Show resolved Hide resolved
xarray/backends/zarr.py Outdated Show resolved Hide resolved
jhamman and others added 2 commits January 31, 2023 14:00
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@shoyer
Copy link
Member

shoyer commented Feb 1, 2023

I like open_zarr(...) because it's less typing than open_dataset(..., engine='zarr'). The automatic backend detection logic doesn't currently work for Zarr, and in every case it adds overhead, which could be significant in the case of remote storage backends like Zarr.

So personally I would rather go the other direction and add open_netcdf().

The inconsistency in the chunks argument is non-ideal, but that could be handled by a separate deprecation process.

@rabernat
Copy link
Contributor

rabernat commented Feb 1, 2023

It is true that Xarray is now becoming very different from pandas in how it opens data.

@weiji14
Copy link
Contributor

weiji14 commented Feb 19, 2023

The inconsistency in the chunks argument is non-ideal, but that could be handled by a separate deprecation process.

There was some discussion on whether xr.open_dataset should default to chunks=None, or switch to open_zarr's chunks="auto" at #4187 (comment) (the last attempt at deprecating open_zarr 🙂). Just bringing it up for debate.

Also, quite a few people were in favour of keeping open_zarr back in 2020 (see e.g. #4187 (comment)), but maybe times have changed, and a consistent API is more desirable than convenience?

@krokosik
Copy link
Contributor

As a frequent Zarr user I would vote in favor of deprecating open_zarr as it is misleading. It always opens the store as a Dataset, even when it is a DataArray saved using to_zarr (new feature, I know). Looking at the issues, there are inconsistencies between open_zarr and open_dataset, so deprecating seems favorable as it removes the need for extra maintenance to keep both methods behaving in a similar way.

@max-sixty
Copy link
Collaborator

max-sixty commented Sep 21, 2023

Personal view — I find .open_zarr convenient! But I would not fight for it to stay...

It always opens the store as a Dataset, even when it is a DataArray saved using to_zarr

I agree — though we could fix that...

@krokosik
Copy link
Contributor

krokosik commented Sep 21, 2023

Yeah, we could fix it, but my point is that having two for opening zarr files raises more issues and requires extra work, while the only benefit I can see is a bit shorter syntax. Some other issue: #8095.

@max-sixty
Copy link
Collaborator

max-sixty commented Sep 22, 2023

having two for opening zarr files raises more issues and requires extra work, while the only benefit I can see is a bit shorter syntax.

Yes, definitely agree...

Not sure how to turn that into a decision — maybe "soft-deprecate" .open_zarr? i.e. de-emphasize, remove from main docs — but don't issue warnings — and then reassess whether .open_data{set;array} has taken over in the future.

That worked well for .drop ("wait we have a .drop method?", ideally you would ask).

@krokosik
Copy link
Contributor

krokosik commented Sep 22, 2023

I don't have much experience on handling such things, as I'm fairly new to collaborating and maintaining an open source project. Sounds good to me, altough I don't see why we wouldn't want to add the deprecation warning?

@max-sixty
Copy link
Collaborator

although I don't see why we wouldn't want to add the deprecation warning?

The logic is that if we're not confident / don't have consensus on the best final state, we can take a small step without committing to anything, and then make another assessment later with more experience.

By de-emphasizing the function from the docs, we reduce the number of new uses, without causing noisy warnings to show up for folks already it. Then we can assess in the future if we properly depreciate & eventually remove it. Or if the feedback is that the function is useful, we can revert.

Otherwise I worry we're not going to get consensus, and so not going to move forward with making any changes.

(But this is just my idea, I'm one of many, we should do whatever the aggregate view is!)

@krokosik
Copy link
Contributor

I see, thank you for explaining. It does sound better, considering that open_zarr is probably used more than the other methods.

@max-sixty
Copy link
Collaborator

Adding a "needs discussion" label — a few things are meandering because of this — #8095, #2660 — and so picking a route would let us focus our energy on a path forward for these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate open_zarr in favor of open_dataset(..., engine='zarr')
7 participants