Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotations for Dataset.where and DataArray.where are wrong when subclassed #8374

Closed
aaronsarna opened this issue Oct 25, 2023 · 9 comments
Labels
duplicate plan to close May be closeable, needs more eyeballs

Comments

@aaronsarna
Copy link

What is your issue?

I'm aware of the guidance of https://docs.xarray.dev/en/stable/internals/extending-xarray.html to avoid subclassing xarray datastructures, but it doesn't seem to outlaw it altogether, so I figured I'd file this issue anyway.

The type annotations for both Dataset.where and DataArray.where claim that these methods will return the type of Self. In practice, though, if they are called on a subclass of either of these classes, they will actually return instances of either DataArray or Dataset, rather than the subclass.

Here's a trivial demonstration: https://colab.research.google.com/drive/1BiGptVUjyKifFI8PSye4v3ADAMXH6KSZ?usp=sharing

I'm not totally sure what the correct type annotations to add here would be, though. My preferred solution would actually be to make the current type hints correct and return instances of the subclass.

@aaronsarna aaronsarna added the needs triage Issue that has not been reviewed by xarray team member label Oct 25, 2023
@welcome
Copy link

welcome bot commented Oct 25, 2023

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

This is a case of #3980 IIUC. So we're very open to returning subclasses; someone needs to do the work on each method though. (and we're very happy to have marginal contributions which gradually support this, adding tests so we ratchet up the support)

Taking a quick look at .where, this runs through .apply_ufunc, which is one of the more difficult places to adjust — though would have widespread benefits if we could change it. We could start by trying something like this:

diff --git a/xarray/core/computation.py b/xarray/core/computation.py
index 9cb60e0c..633723c7 100644
--- a/xarray/core/computation.py
+++ b/xarray/core/computation.py
@@ -316,11 +316,11 @@ def apply_dataarray_vfunc(
     out: tuple[DataArray, ...] | DataArray
     if signature.num_outputs > 1:
         out = tuple(
-            DataArray(
+            arg.__class__(
                 variable, coords=coords, indexes=indexes, name=name, fastpath=True
             )
-            for variable, coords, indexes in zip(
-                result_var, result_coords, result_indexes
+            for variable, coords, indexes, arg in zip(
+                result_var, result_coords, result_indexes, args
             )
         )
     else:

We can leave this open for a bit, though I'd probably suggest folding into #3980 in time.

@max-sixty max-sixty added duplicate and removed needs triage Issue that has not been reviewed by xarray team member labels Oct 25, 2023
@aaronsarna
Copy link
Author

Thanks! Glad to know that there's interest in supporting this use case.

Is there a reasonable way to fix the typehints in the meantime, since it sounds like returning the subclass might be tricky?

@max-sixty
Copy link
Collaborator

max-sixty commented Oct 26, 2023

Very open to thoughts, but I worry that selecting & typing the methods that do & do not respect subclasses will be almost as much work as making subclassing work.

...and almost intractable to oversee and test — is there a test that catches the .where type definitions being wrong? And a test that would also catch more restrictive type definitions being wrong we do get that method to work with subclasses?

@headtr1ck
Copy link
Collaborator

headtr1ck commented Dec 2, 2023

I don't think we test subclasses anywhere since it is still discouraged to do so and support is still considered experimental.
In this case the main culprit is apply_ufunc which still cannot deal with subclasses...

Also, where is defined on DataWithCoords, so it must return Self to be correct for both DataArray and Dataset. Unfortunately this leads to the issue here. Not sure how to solve this without explicity redefining it in the actual classes.

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Dec 2, 2023
@aaronsarna
Copy link
Author

It sounds to me like there are 2 different statements on this issue that are at odds with each other:

  1. "we're very open to returning subclasses; someone needs to do the work on each method though."
  2. "I don't think we test subclasses anywhere since it is still discouraged to do so and support is still considered experimental."

I don't think 1 can ever be accomplished as long as 2 is the policy, since you're basically requiring that subclassing be enabled all at once without any incremental regression testing.

Also, where is defined on DataWithCoords, so it must return Self to be correct for both DataArray and Dataset.

That sounds like code smell to me. If DataWithCoords knows enough to return instances of DataArray or Dataset, as opposed to actually returning an instance of Self, maybe the method shouldn't actually live in this class.

Regardless, I think this could be resolved by DataArray and Dataset having trivial overrides of the method that just changes the type signature. Maybe there could be something more creative with typing.overload inside DataWithCoords that depends on the type of self, but that might just make the code smell worse.

@max-sixty
Copy link
Collaborator

It sounds to me like there are 2 different statements on this issue that are at odds with each other:

1. "we're very open to returning subclasses; someone needs to do the work on each method though."

2. "I don't think we test subclasses anywhere since it is still discouraged to do so and support is still considered experimental."

Yes, and I think it's helpful to surface this disagreement.

I agree with (1)

I would change (2) to "We don't yet have tests, but we could add them onto each method as we do the work from (1)". @headtr1ck how do you think about that?

@headtr1ck
Copy link
Collaborator

I would change (2) to "We don't yet have tests, but we could add them onto each method as we do the work from (1)". @headtr1ck how do you think about that?

Yes ideally we should add tests for subclassing, not only static typing but also if the runtime behavior works as expected.

@max-sixty
Copy link
Collaborator

OK, sounds like we have consensus:

  • We'd like subclassing to work. It's a tractable problem
  • We'd accept contributions getting us closer there
  • We'd accept tests ensuring methods work well with subclasses, that way we can ratchet support through the library

Let's fold into #3980

@max-sixty max-sixty closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

3 participants