-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SortedKeysDict with dict #4753
Conversation
As discussed in dev meeting, this makes sense to pursue. We should try and remove the |
Hello @max-sixty! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-05-16 03:24:39 UTC |
# Conflicts: # xarray/core/utils.py
Data variables: | ||
temperature (y, x) float64 10.98 14.3 12.06 nan ... nan 18.89 10.44 8.293 | ||
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176 | ||
|
||
# FIXME | ||
>>> xr.combine_by_coords([x3, x1], join="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the existing behavior here a bug?
In [16]: x1.dims
Out[16]: Frozen({'y': 2, 'x': 3})
In [17]: x3.dims
Out[17]: Frozen({'y': 2, 'x': 3})
In [18]: xr.combine_by_coords([x3, x1], join="override")
Out[18]:
<xarray.Dataset>
Dimensions: (y: 4, x: 3)
Coordinates:
* x (x) int64 10 20 30
* y (y) int64 0 1 2 3
Data variables:
temperature (y, x) float64 10.98 14.3 12.06 10.9 ... 18.89 10.44 8.293
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176
and override
is documented as:
"override": if indexes are of same size, rewrite indexes to be
those of the first object with that dimension. Indexes for the same
dimension must have the same size in all objects.
So should the result y dim have length 4?
This is changing behavior because the order of dims is changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the repr of x1
and x2
reveals that both x
and y
are valid concat dimensions (given join="override"
) and it just chooses the first one (arbitrarily). I think this PR did not break it but just revealed a problem (also #4824). This should be fixed in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that reference. I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dfferent bug from #4824?
xarray/core/dataset.py
Outdated
... lat=(["x", "y"], lat), | ||
... time=pd.date_range("2014-09-06", periods=3), | ||
... reference_time=pd.Timestamp("2014-09-05"), | ||
... ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pydata/xarray do we have a preference on how we show constructors? I find the dict
representation much easier to read, especially where we're passing tuples of lists etc.
I updated this with vim, could probably do the rest of them with some regex if people agree (different PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have a preference, but keep in mind that with the dict
constructor you're restricted to python identifiers while the dictionary literal takes any hashable (not sure how much of a restricting that would be, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree. As you suggest though — these examples almost exclusively use strings.
This ended up being more work than expected, but I think is ready for review. There's one issue that may be an existing bug (but may be my misunderstanding). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on the dicts. Looks good overall.
I think combine_by_coords
is brittle in case of ambiguous concat dims as in the given example. Before this change the concat dim depended on the name of the dimensions (i.e. their alphabetical ordering), now it depends on the insertion order. I am not sure what's worse. We should absolutely fix that (also #4824).
Data variables: | ||
temperature (y, x) float64 10.98 14.3 12.06 nan ... nan 18.89 10.44 8.293 | ||
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176 | ||
|
||
# FIXME | ||
>>> xr.combine_by_coords([x3, x1], join="override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the repr of x1
and x2
reveals that both x
and y
are valid concat dimensions (given join="override"
) and it just chooses the first one (arbitrarily). I think this PR did not break it but just revealed a problem (also #4824). This should be fixed in another PR.
Ready to go! |
@mathause if this is a new bug then can you make an issue for it please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good way to work around not being able to sort keys of different types.
I worry about the dimension sizes which are part of the first line of the Dataset
repr, though: we've had lots of questions about this because people thought that it would mean the same as in a DataArray
's repr (the order there has a meaning). Arguably, that's because the difference between them is not big enough. This seems to have become less frequent since the introduction of the HTML reprs, so maybe it is also less important?
I agree the current situation isn't ideal (though I'm not sure there's a better one) but also that this doesn't make it much worse — the existing approach doesn't represent an useful ordering, and nor does this. And if we choose to make it mean something in the future, then this is a necessary step towards that... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks!
I wonder if we should hold off on merging until after the 0.18.1 release -- or maybe just move to a date based versioning system / release cycle like dask.
isort . && black . && mypy . && flake8
whats-new.rst
Inspired by @mathause research in #4571 (comment)
Note that one test is removed: https://github.com/pydata/xarray/compare/master...max-sixty:sorted-keys?expand=1#diff-2db7f6624707083db4aaab1b62eb11352200ec9d3ac1055de84877912226d7b5L560
While I'm keen on simplifying the data structures, this may have some unforeseen consequences, and at least deserves some thought re how we handle differently ordered dims.
Currently this PR retains the sorting in reprs.