Update signature open_dataset for API v2 #4547

aurghs · 2020-10-28T08:35:54Z

Proposal for the new API of open_dataset(). It is implemented in apiv2.py and it doesn't modify the current behavior of api.open_dataset().
It is something in between the first and second alternative suggested at #4490 (comment), see the related quoted text:

Describe alternatives you've considered

For the overall approach:

We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though.

We could keep the current design only for the user facing open_dataset() interface, and then internally convert into the DecodingOptions() struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?

Instead of a class for the decoders, I have added a function: resolve_decoders_kwargs.
resolve_decoders_kwargs performs two tasks:

If decode_cf is False, it sets to False all the decoders supported by the backend (using inspect).
It filters out the None decoder keywords.

So xarray manages the keyword decode_cf and passes on only the non-default decoders to the backend. If the user sets to a non-None value a decoder not supported by the backend, the backend will raise an error.

With this implementation drop_variable should be always supported by the backend. But I think this could be implemented easely by all the backends. I wouldn't group it with the decoders: to me, it seems to be more a filter rather than a decoder.

The behavior decode_cf is unchanged.

PRO:

the user doesn't need to import and instantiate a class.
users get the argument completion on open_dataset.
the backend defines directly in open_backend_dataset_${engine} API the accepted decoders.
xarray manages decode_cf, not the backends.

Missing points:

deccode_cf should be renamed decode. Probably, the behavior of decode should be modified for two reason:
- currently If decode_cf is False, it sets the decoders to False, but there is no check on the other values. The accepted values should be: None (it keeps decoders default values), True (it sets all the decoders to True), False (it sets all the decoders to False).
- currently we can set both a decoder and decode_cf without any warning. , but the
Deprecate backend_kwargs (or kwargs).
Separate mask_and_scale?

I think that we need a different PR for the three of them.

related to Group together decoding options into a single argument #4490
Passes isort . && black . && mypy . && flake8

Add necessary imports for this function.

# Conflicts: # xarray/backends/api.py

…ad-refactor # Conflicts: # xarray/backends/apiv2.py

- to be used in apiv2 without instantiate the object

- modify signature - move default setting inside backends

…2.dataset_from_backend_dataset`

…t decodings.

…ay into change-signature-open_dataset � Conflicts: � xarray/backends/apiv2.py

…iour is unchanged.

- add plugins.py cotaining backneds info

shoyer · 2020-10-29T15:20:17Z

xarray/backends/apiv2.py

+    signature = inspect.signature(ENGINES[engine]).parameters
+    if decode_cf is False:
+        for d in decoders:
+            if d in signature and d != "use_cftime":


Do we need this special case d != "use_cftime"? Does it break any tests if we simply remove it?

(My guess is that the existing code may not bother to set use_cftime = False, but only because the value of use_cftime is ignored if decode_times = False.)

I have forgotten to check it. You are right, we can remove it.

shoyer · 2020-11-05T08:29:17Z

xarray/backends/apiv2.py

+    decoders = resolve_decoders_kwargs(
+        decode_cf,
+        engine=engine,
+        mask_and_scale=mask_and_scale,
+        decode_times=decode_times,
+        decode_timedelta=decode_timedelta,
+        concat_characters=concat_characters,
+        use_cftime=use_cftime,
+        decode_coords=decode_coords,
+    )
+
    backend_kwargs = backend_kwargs.copy()
    overwrite_encoded_chunks = backend_kwargs.pop("overwrite_encoded_chunks", None)

-    open_backend_dataset = _get_backend_cls(engine, engines=ENGINES)
+    open_backend_dataset = _get_backend_cls(engine, engines=plugins.ENGINES)[
+        "open_dataset"
+    ]
    backend_ds = open_backend_dataset(
        filename_or_obj,
+        drop_variables=drop_variables,
+        **decoders,
        **backend_kwargs,
        **{k: v for k, v in kwargs.items() if v is not None},
    )


For completeness, let me mention an alternative design that would not need inspect.signature. Instead of using a single function open_backend_dataset, we could use a "Loader" class with two separate methods, one for decoding a dataset and another for returning a raw dataset, e.g.,

backend_cls = _get_backend_cls(engine) loader = backend_loader(filename_or_obj, **backend_kwargs) if decode: ds = loader.load_decoded(**decoders) else: ds = loader.load_raw()

Is this better than the current approach in this PR (using inspect), or the current approach (on master) of calling the single open_backend_dataset() function with decode_cf=False? Honestly I'm not sure. It is most explicit, but also probably a little more annoying to write.

It seems to me that in this way we are going to complicate the interface without a real advantage. But I'm not sure about it. @alexamici, @jhamman what do you think about it?

@shoyer the main reason we proposed the use inspect is to raise an appropriate error message when a backend doesn't support a specific decoding option.

Your proposal simplifies the mangling of the decode options, but I'd still use inspect in the same as we do now before calling load_decode.

Personally I'd favour keeping the current implementation.

shoyer · 2020-11-05T08:39:35Z

xarray/backends/plugins.py

+                raise TypeError(
+                    f'All the parameters in {engine["open_dataset"]!r} signature should be explicit. '
+                    "*args and **kwargs is not supported"
+                )


This looks like it correctly implements my suggestion from last week 👍 .

I don't know how annoying backend authors would find this restriction to be.

With the current implementation a backend developer can add the "signature" key explicitly to the engine dictionary to skip running the inspection.

This allows also providing a "open_dataset" as a C-function.

I like the current implementation.

shoyer · 2020-11-05T16:32:46Z

As discussed, let's stick with the current PR

@aurghs @alexamici you should have "merge" permissions now!

alexamici · 2020-11-06T14:41:32Z

I fixed the type ints and the two test failures are unrelated to the changes.

I'm going to test my shiny new merge rights :)

keewis · 2020-11-06T14:55:27Z

xarray/backends/plugins.py

+    if "signature" not in engine:
+        parameters = inspect.signature(engine["open_dataset"]).parameters
+        for name, param in parameters.items():
+            if param.kind in (
+                inspect.Parameter.VAR_KEYWORD,
+                inspect.Parameter.VAR_POSITIONAL,
+            ):
+                raise TypeError(
+                    f'All the parameters in {engine["open_dataset"]!r} signature should be explicit. '
+                    "*args and **kwargs is not supported"
+                )


that's a lot of indentation! I think it might be a bit easier to read if we get rid of one level of indentation using

if "signature" in engine: continue

nothing urgent, though.

You are completely right!

There is some style fix that I really need to do. I will do a new pull request.

aurghs and others added 30 commits September 25, 2020 19:07

add in api.open_dataset dispatching to stub apiv2

f961606

remove in apiv2 check for input AbstractDataStore

fb166fa

bugfix typo

0221eec

add kwarg engines in _get_backend_cls needed by apiv2

36a02c7

add alpha support for h5netcdf

cfb8cb8

style: clean not used code, modify some variable/function name

4256bc8

Add ENGINES entry for cfgrib.

1bc7391

Define function open_backend_dataset_cfgrib() to be used in apiv2.py.

748fe5a

Add necessary imports for this function.

Apply black to check formatting.

fb368fe

Apply black to check formatting.

80e111c

add dummy zarr apiv2 backend

e15ca6b

Merge branch 'master' into backend-read-refactor

025cc87

# Conflicts: # xarray/backends/api.py

align apiv2.open_dataset to api.open_dataset

4b19399

remove unused extra_coords in open_backend_dataset_*

572595f

Merge remote-tracking branch 'origin/cfgrib_refactor' into backend-re…

d6e632e

…ad-refactor # Conflicts: # xarray/backends/apiv2.py

remove extra_coords in open_backend_dataset_cfgrib

74aba14

transform zarr maybe_chunk and get_chunks in classmethod

d6280ec

- to be used in apiv2 without instantiate the object

make alpha zarr apiv2 working

c0e0f34

refactor apiv2.open_dataset:

6431101

- modify signature - move default setting inside backends

move dataset_from_backend_dataset out of apiv2.open_dataset

50d1ebe

remove blank lines

383d323

remove blank lines

457a09c

style

2803fe3

Re-write error messages

08db0bd

Fix code style

1f11845

Fix code style

93303b1

remove unused import

bc2fe00

replace warning with ValueError for not supported kwargs in backends

d694146

change zarr.ZarStore.get_chunks into a static method

56f4d3f

group backend_kwargs and kwargs in extra_tokes argument in apiv…

df23b18

…2.dataset_from_backend_dataset`

aurghs and others added 13 commits October 21, 2020 09:21

reverse changes in chunks management

c9088d3

move check on decoders from backends to open_dataset (apiv2)

fe8099c

update documentation

fed8b3e

Change signature of open_dataset function in apiv2 to include explici…

6fec3ea

…t decodings.

Set an alias for chunks='auto'.

231895e

Allign empty rows with previous version.

b88b567

reverse changes in chunks management

be51bc7

move check on decoders from backends to open_dataset (apiv2)

5aa533d

update documentation

7e75f1c

Merge branch 'change-signature-open_dataset' of github.com:bopen/xarr…

2047d46

…ay into change-signature-open_dataset � Conflicts: � xarray/backends/apiv2.py

change defaut value for decode_cf in open_dataset. The function bahav…

3057abb

…iour is unchanged.

Review docstring of open_dataset function.

842fc29

bugfix typo

ff1181c

aurghs mentioned this pull request Oct 29, 2020

Group together decoding options into a single argument #4490

Open

aurghs added 3 commits November 2, 2020 09:56

- add check on backends signatures

bdcf0fe

- add plugins.py cotaining backneds info

- black isort

61be8a8

- add type declaration in plugins.py

c0b290a

shoyer reviewed Nov 5, 2020

View reviewed changes

shoyer approved these changes Nov 5, 2020

View reviewed changes

alexamici added 3 commits November 6, 2020 13:40

Fix the type hint for ENGINES

c217031

Drop special case and simplify resolve_decoders_kwargs

8530ff0

isort

73328ac

alexamici changed the title ~~Update signature open dataset~~ Update signature open_dataset for API v2 Nov 6, 2020

alexamici merged commit ba989f6 into pydata:master Nov 6, 2020

keewis reviewed Nov 6, 2020

View reviewed changes

alexamici added grant-czi topic-backends labels Dec 10, 2020

aurghs deleted the change-signature-open_dataset branch February 11, 2021 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update signature open_dataset for API v2 #4547

Update signature open_dataset for API v2 #4547

aurghs commented Oct 28, 2020

shoyer Oct 29, 2020

aurghs Nov 5, 2020

shoyer Nov 5, 2020

aurghs Nov 5, 2020

alexamici Nov 5, 2020

shoyer Nov 5, 2020

alexamici Nov 5, 2020 •

edited

Loading

shoyer commented Nov 5, 2020

alexamici commented Nov 6, 2020

keewis Nov 6, 2020

aurghs Nov 9, 2020

Update signature open_dataset for API v2 #4547

Update signature open_dataset for API v2 #4547

Conversation

aurghs commented Oct 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexamici Nov 5, 2020 • edited Loading

Choose a reason for hiding this comment

shoyer commented Nov 5, 2020

alexamici commented Nov 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexamici Nov 5, 2020 •

edited

Loading