Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Type Hinting for dims Parameter in xarray Methods #8210

Closed
andersy005 opened this issue Sep 19, 2023 · 8 comments
Closed

Inconsistent Type Hinting for dims Parameter in xarray Methods #8210

andersy005 opened this issue Sep 19, 2023 · 8 comments
Labels
plan to close May be closeable, needs more eyeballs topic-NamedArray Lightweight version of Variable topic-typing

Comments

@andersy005
Copy link
Member

andersy005 commented Sep 19, 2023

None is not really practical in current xarray so not allowing it as a dimension is probably the easiest path, but type hinting will not be correct.

I want dims to have a type hint that is consistent, easy to read and understand. In a dream world it would look something like this:

InputDim = Hashable # Single dimension
InputDims = Iterable[InputDim , ...] # multiple dimensions
InputDimOrDims = Union[InputDim, InputDims]  # Single or multiple dimensions

Then we can easily go through our xarray methods and easily replace dim and dims arguments.

Hashable could be fine in NamedArray, we haven't introduced None as a typical default value there yet.

But it wouldn't be easy in xarray because we use None as default value a lot, which will (I suspect) lead to a bunch of refactoring and deprecations. I haven't tried it maybe it's doable?

Another idea is to try and make a HashableExcludingNone:

HashableExcludingNone = Union[int, str, tuple, ...] # How many more Hashables are there?
InputDim = HashableExcludingNone # Single dimension
InputDims = Iterable[InputDim , ...] # multiple dimensions
InputDimOrDims = Union[InputDim, InputDims]  # Single or multiple dimensions

I suspect this is harder than it seems.

Another idea is drop the idea of Hashable and just allow a few common ones that are used:

InputDim = str # Single dimension
InputDims = tuple[InputDim , ...] # multiple dimensions
InputDimOrDims = Union[InputDim, InputDims]  # Single or multiple dimensions

Very clear! I think a few users (and maintainers) will be sad because of the lack of flexibility though.

No easy paths, and trying to be backwards compatible is very demotivating.

Originally posted by @Illviljan in #8075 (comment)

@github-actions github-actions bot added the needs triage Issue that has not been reviewed by xarray team member label Sep 19, 2023
@andersy005 andersy005 added topic-NamedArray Lightweight version of Variable topic-typing and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 19, 2023
@headtr1ck
Copy link
Collaborator

See also #6142 and #8199

@headtr1ck
Copy link
Collaborator

HashableExcludingNone is not really possible because Hashable is a protocol and therefore this can be any custom class.

I think you can tell Mypy that None is not allowed by overloading the function and use the NoReturn return value.

But we do not want to do that for every method that accepts dimension arguments...

I assume that the most common non-string types are enums and composite types (e.g. tuple[str, str,].

@max-sixty
Copy link
Collaborator

Can I ask — is the extent of the problem that None will be accepted when it shouldn't be? That doesn't seem so bad?

Or that we can't discriminate between supplying dims and not supplying dims, because None is seen as dim, and this makes type inference worse?

@headtr1ck
Copy link
Collaborator

I think currently it is totally valid and possible to use None as a dimension or variable name (since None is hashable).

However, you will have a hard time using this Variable/DataArray/Dataset since every method with a "dim" argument uses None as default with a special meaning, so you can never do your operations along the None-dimension.

@andersy005
Copy link
Member Author

Based on @headtr1ck's comment, here is one example:

In [9]: c = xr.DataArray([[1,2], [0, 0]]).assign_coords(coords={"dim_0": [3,4]}).rename({"dim_0": None})

In [10]: c
Out[10]: 
<xarray.DataArray (None: 2, dim_1: 2)>
array([[1, 2],
       [0, 0]])
Coordinates:
  * None     (None) int64 3 4
Dimensions without coordinates: dim_1

In [11]: c.mean(dim=None) # incorrect: i don't want to collapse all dimensions just `None`
Out[11]: 
<xarray.DataArray ()>
array(0.75)

@max-sixty
Copy link
Collaborator

Rephrasing my question slightly — does this have any impact for those who don't use None as a dimension name? Does it restrict mypy for resolving branches where a default None is supplied for example?

Or is the issue "mypy says you can use None as a dimension name, but actually you can't."

...which arguably isn't too bad — it does seem like a corner case, and there's lots & lots of things mypy doesn't pick up on?

@headtr1ck
Copy link
Collaborator

headtr1ck commented Sep 20, 2023

Rephrasing my question slightly — does this have any impact for those who don't use None as a dimension name?

No. At least I cannot think of any realistic problem (maybe accidentally renaming something to None and not getting an error).

Does it restrict mypy for resolving branches where a default None is supplied for example?

When the return type is different one can always overload it and Mypy will know what to do with the overlapping (we do that already occasionally).

Or is the issue "mypy says you can use None as a dimension name, but actually you can't."

With the current definition this is somewhat true.

I think this is a minor problem. Probably very few people ever tried this. Theoretically you can use None as e.g. a dimension name and then even use some methods by supplying dim=[None], but all methods that expect a single dimension basically use None as their default.

...which arguably isn't too bad — it does seem like a corner case, and there's lots & lots of things mypy doesn't pick up on?

I also don't find it too bad, especially since we always specify Hashable | None so there should be little confusion when using the types as extra documentation.


I think this issue came up when thinking about the following change:

Allow None explicitly as valid dimension/variable name and replace all defaults with a special default value (the usual enum member trick, some methods do this already if I remember correctly).

While definitely doable, I am not in favor of such an approach because None as default is a widely established python standard. Also this would be a huge breaking change, people might actively supply None and not rely on the default value.

@max-sixty
Copy link
Collaborator

max-sixty commented Sep 10, 2024

Is this close-able?

My takeaway is "if you use None as a dimension name, things won't work for you. mypy won't catch the mistake."

(but there's a very large space of things that won't work, and almost as large a space of things mypy won't catch)

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-NamedArray Lightweight version of Variable topic-typing
Projects
None yet
Development

No branches or pull requests

3 participants