Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Proposed refactor of read API for backends #4477

Merged
merged 36 commits into from
Oct 22, 2020
Merged

WIP: Proposed refactor of read API for backends #4477

merged 36 commits into from
Oct 22, 2020

Conversation

aurghs
Copy link
Collaborator

@aurghs aurghs commented Sep 30, 2020

The first draft of the new backend API:

  • Move decoding inside the bakends.
  • Backends return Dataset with BackendArray.
  • Xarray manages chunking and caching.
  • Some code is duplicated, it will be simplified later.
  • Zarr chunking is still inside the backend for now.

cc @jhamman @shoyer

aurghs and others added 27 commits September 25, 2020 19:07
# Conflicts:
#	xarray/backends/api.py
…ad-refactor

# Conflicts:
#	xarray/backends/apiv2.py
- to be used in apiv2 without instantiate the object
- modify signature
- move default setting inside backends
@alexamici
Copy link
Collaborator

Note that this PR doesn't change xarray behaviour unless the user sets the environment variable XARRAY_BACKEND_API=v2.

To test the new code paths you can use: XARRAY_BACKEND_API=v2 pytest -v xarray/tests/test_backend*py

backend_ds = open_backend_dataset(
filename_or_obj,
**backend_kwargs,
**{k: v for k, v in kwargs.items() if v is not None},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing everything directly as supplied by the user feels a little non-ideal to me here. I wonder if we could group together decoding options into a single argument first? See #4490 for a proposal of what that could look like.

Comment on lines 98 to 102
if kwargs:
warnings.warn(
"The following keywords are not supported by the engine "
"and will be ignored: %r" % kwargs
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issuing warnings for unrecognized keyword arguments seems dangerously permissive. Could we raise errors here?

(This might be easier once we switch to putting all the CF decoding conventions into a single argument, which could then be extended without breaking every backend.)


ds = Dataset(vars, attrs=attrs)
ds = ds.set_coords(coord_names.intersection(vars))
ds._file_obj = file_obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to make this a public API. I guess users still won't want to see this in most cases, so perhaps just keep this in mind for something to document in the new "how to write a backend" documentation?

Comment on lines 365 to 366
@classmethod
def get_chunk(cls, name, var, chunks):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classmethod is fine, though I guess staticmethod would also work.

Comment on lines 26 to 27
backend_kwargs,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be grouped into a single argument, something like extra_tokens?

@shoyer
Copy link
Member

shoyer commented Oct 6, 2020

I really like the direction this is going! See #4490 for a related design proposal, that could build on top of this (and/or potentially make this easier).

Comment on lines 98 to 102
if kwargs:
raise ValueError(
"The following keywords are not supported by the engine "
"and will be ignored: %r" % kwargs
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler option is to just omit these lines and **kwargs entirely. Python will supply an error message like TypeError: open_backend_dataset_cfgrib() got an unexpected keyword argument 'something_unexpected'

Copy link
Collaborator Author

@aurghs aurghs Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! It doesn't make sense to keep these lines!
The warning was needed in the first version of the code because I didn't filter the input keys. But the idea was to remove it later.

Comment on lines +81 to +82
backend_kwargs=None,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that isn't quite clear to me: why have both **kwargs and backend_kwargs, when they are merged together?

Is the intention to only keep one of them in the long term?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intention is to keep only one in long term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep kwargs, but reading #4490 I suspect that you prefer to keep backend_kwargs.
If we keep backend_kwargs, maybe it is more clear for the user the separation between xarray inputs and backend inputs. I also like the way you suggest to use deprecate_kwargs.
The main pro for keeping kwargs is that the user can save 21 chars - backend_kwargs=dict() (and that is important!).
I also prefer flat structures, when possible. But that's a matter of taste.

@shoyer shoyer marked this pull request as ready for review October 22, 2020 15:06
@shoyer shoyer merged commit cc271e6 into pydata:master Oct 22, 2020
@shoyer
Copy link
Member

shoyer commented Oct 22, 2020

Merging per offline discussion. This is all hidden behind an environment variable, so this will allow @aurghs and @alexamici to continue working on these new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants