Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining Pairwise Interactions #3

Open
jthielen opened this issue Sep 24, 2021 · 4 comments
Open

Defining Pairwise Interactions #3

jthielen opened this issue Sep 24, 2021 · 4 comments

Comments

@jthielen
Copy link

jthielen commented Sep 24, 2021

As summarized in #1, the interactions between duck array libraries cannot be sufficiently described by a (linked-)list of priorities (as can arise from __array_priority__), but is instead best described as a directed graph. So that the dispatch between types can work out consistently and unambiguously, this graph needs to be acyclic, which thereby requires agreement/coordination between duck array libraries.

See also: dask/dask#6635

Current State

Presently, this coordination has been informally done through independent/ad-hoc implementations in each duck array library. Two main approaches have arisen:

  • Have an "allow list" of types that this array type can handle/wrap, and defer to any other (e.g., Dask)
  • Have a "deny list" of types to which this array type defers, and assume any other "sufficiently array-like" type can be handled/wrapped (e.g., Pint, but also xarray if the "deny list" is effectively empty)

For a limited set of commonly-used array types in the pydata stack, this has often worked out in practice so far. However, as the number of duck array libraries increases, maintaining agreement between libraries through the existing independent approaches becomes difficult.

As an example of what this type casting hierarchy looks like in practice, Pint has summarized the consensus DAG between several common array types (as of 2020) as follows:

Furthermore, these interactions often play out implicitly via protocols like __array_ufunc__ and __array_function__. In contrast, an explicit strategy like NEP 37 may be a preferable way to define these pairwise interactions.

Specific Goals

  • The directed graph of array type interactions is agreed upon across the community and remains acyclic
  • Introduction of new types to the accepted DAG is easy and can be safely done without introducing cyclicness.

Key Points Raised at Coordination Meeting

  1. While a xarray -> pint -> dask -> others casting order for the "top" of the type DAG has been used in practice to this point, several issues/PRs left a full agreement on this order unresolved. There was consensus that this ordering can be formalized.
    • A noted clarification is that xarray is not the only top of the full DAG...this consensus doesn't carry with it any prohibition on another type unwrappable by xarray that wraps other still handles types lower in the DAG.
  2. Duck arrays shouldn't be expected to define all interactions with everything, instead, they should only define operations on similar types and raise otherwise.
    • This favors the "allow list" over the "deny list" approach.
  3. A new library defining (or at least verifying/providing utilities for) a shared type resolution DAG among participating duck array libraries has been suggested and received general support. However, several implementation decisions need to be discussed.
    • Particularly, where should these definitions of interactions lie?
      • Interpreted from NEP 37 __array_module__?
      • A new slot for "handled types"?
      • Some kind of registry in this new library?

Suggested Paths Forward

Duck Array DAG Library

Discussion (in this issue hopefully) on working out the details of a shared type resolution DAG library is needed! To get the conversation started, here are the points I'm aware of that need resolution:

  • Are enough of the key duck array libraries (e.g., xarray, pint, Dask, sparse, CuPy) willing to participate in and use this shared library to make the effort worthwile?
  • pydata is the most likely home for this library, but what to name it and who should lead its maintenance?
  • (Mentioned above) Where should the definitions of pairwise interactions lie?
    • Interpreted from NEP 37 __array_module__?
    • A new slot for "handled types"?
    • Some kind of registry in this new library?
  • What role should this library have?
    • Optional checking/verification that the DAG works out
    • Enforcement of acylicness of the directed graph of interactions (which is basically the previous option but with utils to raise errors where relevant)
    • Provide full utilities that participating libraries can (or must?) use in their implementations of wrapping/binop/__array_ufunc__/__array_function__/array function modules
    • Something else?
  • How to consistently handle otherwise-unknown array types and be welcoming to any new array-like libraries that try to enter the ecosystem?

Once these are resolved, then more detailed discussions (such as API creation) can presumably take place on this new library's repo.

Changes to Participating Libraries

Libraries currently using a "deny list"/"accept all" approach (namely, xarray and pint) may need to change to an "allow list" approach to meet the community consensus, which brings with it backwards compatibility concerns. However, it makes the most sense (to me at least) to make any such changes only at the point when the aforementioned DAG library is also adopted, and at most issue warnings for unknown, but handled-for-now, types for now.

@SimonHeybrock
Copy link

Could you provide a couple of concrete examples where special knowledge/handling of wrapped duck arrays is required?

@jthielen
Copy link
Author

Could you provide a couple of concrete examples where special knowledge/handling of wrapped duck arrays is required?

I'm not sure I understand your question in relation to this issue on formalizing agreed-upon interaction priorities. What do you mean by "special," and by who is this knowledge/handling required (array wrapping libraries, array utilizing libraries, library users, etc.)?

@SimonHeybrock
Copy link

Could you provide a couple of concrete examples where special knowledge/handling of wrapped duck arrays is required?

I'm not sure I understand your question in relation to this issue on formalizing agreed-upon interaction priorities. What do you mean by "special," and by who is this knowledge/handling required (array wrapping libraries, array utilizing libraries, library users, etc.)?

I think what I am asking is: In many simple cases having an array implementation implement, e.g., __array_ufunc__ and delegate to lower-level libraries (e.g., Pint delegating array ops on its magnitude to NumPy, or whichever array library is wrapped) is sufficient. Do we have an understand or rough list of cases where it is not, i.e., where to we need more than delegating "down the stack"?

@jthielen
Copy link
Author

jthielen commented Sep 22, 2022

I think what I am asking is: In many simple cases having an array implementation implement, e.g., __array_ufunc__ and delegate to lower-level libraries (e.g., Pint delegating array ops on its magnitude to NumPy, or whichever array library is wrapped) is sufficient. Do we have an understand or rough list of cases where it is not, i.e., where to we need more than delegating "down the stack"?

Ah, I think then there is a misunderstanding here. This issue is indeed about such simple cases of implementing and delegating to lower-level libraries! The problem is resolving what "lower-level" means in an unambiguous (directed graph must be acyclic) and fully generalized (can insert any wrapping array library into the graph; don't want it to just be xarray, dask, and pint) way. I'm not aware of any use cases where the standard set of __array_ufunc__, __array_function__, and binary operations have this kind of simple pairwise delegation be insufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants