Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new backend api documentation #4810

Merged
merged 58 commits into from
Mar 8, 2021
Merged

add new backend api documentation #4810

merged 58 commits into from
Mar 8, 2021

Conversation

aurghs
Copy link
Collaborator

@aurghs aurghs commented Jan 14, 2021

doc/internals.rst Outdated Show resolved Hide resolved
@jhamman jhamman changed the title WIP: add e new backend documentation WIP: add new backend api documentation Jan 28, 2021
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aurghs

I have mostly minor suggestions.

One high-level comment is that it would be nice to have an example implementation with pseudo-code. That would make it easier to understand.

class YourBackendArray(BackendEntrypoint):
    def open_dataset(...):
        ds = ...
 	    # example built-in decoder use
        return ds

    open_dataset_parameters = ...

    def guess_can_open(filename_obj):
        # do checks
        return True


- Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint`
- Implement the method ``open_dataset`` that returns an instance of :py:class:`~xarray.Dataset`
- Declare such a class as an external plugin in your setup.py.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to the entrypoints documentation here.

Suggested change
- Declare such a class as an external plugin in your setup.py.
- Declare such a class as an external plugin in your ``setup.py``.

EDIT: I see it's linked below. Won't hurt to add a link here also.

doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that an example would go a long way. It could also be a good idea to point to the simplest of the implemented backends.

I tried to look at some of the backends that are now available, e.g. pydap and one thing that's missing from the description here is the DataStore.

Not for here but I did also not find much on what a LazilyOuterIndexedArray is etc...

doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the need for examples, but overall I think this is pretty well structured. In case it is of use, I find this pretty helpful: https://documentation.divio.com/how-to-guides/

I think it would be useful to have a comparison between the old and new API, but I'm not sure if the old API ever was part of the public API (if not this is not necessary).

BackendArray, CachingFileManager, LazilyOuterIndexedArray, and LazilyVectorizedIndexedArray are not yet public API (I think?), so if we want to change that I think we should make sure their purpose is properly documented (it looks like that's what you were going to do, though).

This makes internals a fairly long document, so we should think about splitting this into different pages (not in this PR). I think @andersy005 started something like that in #4835.

to integrate any code in Xarray; all you need to do is approaching the
following steps:

- Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint`
Copy link
Collaborator

@keewis keewis Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Create a class that inherits from Xarray py:class:`~xarray.backend.commonBackendEntrypoint`
- Create a class that inherits from :py:class:`~xarray.backends.common.BackendEntrypoint`

also, I think I read somewhere that the official spelling of the package is xarray (I can't remember where, though). If that is still valid, I think we should make sure the documentation follows that.

doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Show resolved Hide resolved
aurghs and others added 14 commits February 4, 2021 04:27
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: keewis <keewis@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: keewis <keewis@users.noreply.github.com>
@jhamman
Copy link
Member

jhamman commented Mar 3, 2021

@jp-dark - Given your recent experience developing a backend using this new API, it would be great if you could provide some comments on this PR.

@kmuehlbauer
Copy link
Contributor

@jhamman As I'm currently implementing xarray backends for reading weather radar data, I've already closely followed the adventures of @aurghs and @alexamici.

To the current state of the backend documentation I can say that it is very well written and understandable (also for non native speakers) and I could easily follow the train of thought.

From my perspective: I would not try to show extensive examples in the beginning, but would extend/rearrange documentation after first users issued problems/asked questions. So that only things get special documentation which are not immediately clear from users perspective.

This is a really remarkable extension which will boost usage of xarray. Thanks!

If you support more complex indexing as `explicit indexing` or
`numpy indexing`, you can have a look to the implemetation of Zarr backend and Scipy backend,
currently available in :py:mod:`~xarray.backends` module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to add a discussion here on how a raw key is transformed to an ExplicitIndexer when querying an xarray DataArray with a third-party backend. This would be especially useful for picking good test cases for querying DataArrays and debugging problems in edge-cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to add something about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at what to add here. I started writing some more details, but I think It would become a little bit too long.
Today we should merge. So I suggest to merge it as it is, and I'll add later a dedicated section.


- Implement a function that returns an instance :py:class:``Dataset``

- Create a `BackendEntrypoint`` instance with your function as input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Create a `BackendEntrypoint`` instance with your function as input.
- Create a ``BackendEntrypoint`` instance with your function as input.


**Output**

```BackendEntrypoint`.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```BackendEntrypoint`.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset`
``BackendEntrypoint.open_dataset`` output shall be an instance of Xarray :py:class:`Dataset`

While ``open_dataset`` is mandatory, ``open_dataset_parameters`` and ``guess_can_open`` are optional.


BackendEntrypoint.open_dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting the interface as a class? That might make it easier to document (e.g., in the form of docstrings).

doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
aurghs and others added 4 commits March 3, 2021 21:56
Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
doc/internals.rst Show resolved Hide resolved
doc/internals.rst Outdated Show resolved Hide resolved
aurghs and others added 3 commits March 5, 2021 22:23
Co-authored-by: Julia Dark <github@email.jpdark.com>
Co-authored-by: Julia Dark <github@email.jpdark.com>
@alexamici alexamici merged commit d2582c2 into pydata:master Mar 8, 2021
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 18, 2021
…indow

* upstream/master:
  add polyval to polyfit see also (pydata#5020)
  mention map_blocks in the docstring of apply_ufunc (pydata#5011)
  Switch backend API to v2 (pydata#4989)
  WIP: add new backend api documentation (pydata#4810)
  pin netCDF4=1.5.3 in min-all-deps (pydata#4982)
  fix matplotlib errors for single level discrete colormaps (pydata#4256)
  Adapt exception handling in CFTimeIndex.__sub__ and __rsub__ (pydata#5006)
  Update options.py (pydata#5000)
  Adjust tests to use updated pandas syntax for offsets (pydata#4537)
  add a combine_attrs parameter to Dataset.merge (pydata#4895)
  Support for dask.graph_manipulation (pydata#4965)
  raise on passing axis to Dataset.reduce methods (pydata#4940)
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 23, 2021
…-tasks

* upstream/master:
  Fix regression in decoding large standard calendar times (pydata#5050)
  Fix sticky sidebar responsiveness on small screens (pydata#5039)
  Flexible indexes refactoring notes (pydata#4979)
  add a install xarray step to the upstream-dev CI (pydata#5044)
  Adds Dataset.query() method, analogous to pandas DataFrame.query() (pydata#4984)
  run tests on python 3.9 (pydata#5040)
  Add date attribute to datetime accessor (pydata#4994)
  📚 New theme & rearrangement of the docs (pydata#4835)
  upgrade ci-trigger to the most recent version (pydata#5037)
  GH5005 fix documentation on open_rasterio (pydata#5021)
  GHA for automatically canceling previous CI runs (pydata#5025)
  Implement GroupBy.__getitem__ (pydata#3691)
  conventions: decode unsigned integers to signed if _Unsigned=false (pydata#4966)
  Added support for numpy.bool_ (pydata#4986)
  Add additional str accessor methods for DataArray (pydata#4622)
  add polyval to polyfit see also (pydata#5020)
  mention map_blocks in the docstring of apply_ufunc (pydata#5011)
  Switch backend API to v2 (pydata#4989)
  WIP: add new backend api documentation (pydata#4810)
  pin netCDF4=1.5.3 in min-all-deps (pydata#4982)
@aurghs aurghs changed the title WIP: add new backend api documentation add new backend api documentation Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants