Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace SortedKeysDict with dict #4753

Merged
merged 19 commits into from
May 19, 2021
Merged

Conversation

max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Jan 3, 2021

  • Tests added
  • Passes isort . && black . && mypy . && flake8
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Inspired by @mathause research in #4571 (comment)

Note that one test is removed: https://github.com/pydata/xarray/compare/master...max-sixty:sorted-keys?expand=1#diff-2db7f6624707083db4aaab1b62eb11352200ec9d3ac1055de84877912226d7b5L560

While I'm keen on simplifying the data structures, this may have some unforeseen consequences, and at least deserves some thought re how we handle differently ordered dims.

Currently this PR retains the sorting in reprs.

@max-sixty
Copy link
Collaborator Author

As discussed in dev meeting, this makes sense to pursue. We should try and remove the repr sorting if we can manage the test breaks.

@pep8speaks
Copy link

pep8speaks commented Jan 23, 2021

Hello @max-sixty! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-16 03:24:39 UTC

Data variables:
temperature (y, x) float64 10.98 14.3 12.06 nan ... nan 18.89 10.44 8.293
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176

# FIXME
>>> xr.combine_by_coords([x3, x1], join="override")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the existing behavior here a bug?

In [16]: x1.dims
Out[16]: Frozen({'y': 2, 'x': 3})

In [17]: x3.dims
Out[17]: Frozen({'y': 2, 'x': 3})

In [18]: xr.combine_by_coords([x3, x1], join="override")
Out[18]:
    <xarray.Dataset>
    Dimensions:        (y: 4, x: 3)
    Coordinates:
      * x              (x) int64 10 20 30
      * y              (y) int64 0 1 2 3
    Data variables:
        temperature    (y, x) float64 10.98 14.3 12.06 10.9 ... 18.89 10.44 8.293
        precipitation  (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176

and override is documented as:

"override": if indexes are of same size, rewrite indexes to be
those of the first object with that dimension. Indexes for the same
dimension must have the same size in all objects.

So should the result y dim have length 4?

This is changing behavior because the order of dims is changing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the repr of x1 and x2 reveals that both x and y are valid concat dimensions (given join="override") and it just chooses the first one (arbitrarily). I think this PR did not break it but just revealed a problem (also #4824). This should be fixed in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that reference. I agree.

Copy link
Member

@TomNicholas TomNicholas May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dfferent bug from #4824?

... lat=(["x", "y"], lat),
... time=pd.date_range("2014-09-06", periods=3),
... reference_time=pd.Timestamp("2014-09-05"),
... ),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pydata/xarray do we have a preference on how we show constructors? I find the dict representation much easier to read, especially where we're passing tuples of lists etc.

I updated this with vim, could probably do the rest of them with some regex if people agree (different PR)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have a preference, but keep in mind that with the dict constructor you're restricted to python identifiers while the dictionary literal takes any hashable (not sure how much of a restricting that would be, though).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree. As you suggest though — these examples almost exclusively use strings.

@max-sixty
Copy link
Collaborator Author

This ended up being more work than expected, but I think is ready for review. There's one issue that may be an existing bug (but may be my misunderstanding).

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on the dicts. Looks good overall.

I think combine_by_coords is brittle in case of ambiguous concat dims as in the given example. Before this change the concat dim depended on the name of the dimensions (i.e. their alphabetical ordering), now it depends on the insertion order. I am not sure what's worse. We should absolutely fix that (also #4824).

Data variables:
temperature (y, x) float64 10.98 14.3 12.06 nan ... nan 18.89 10.44 8.293
precipitation (y, x) float64 0.4376 0.8918 0.9637 ... 0.5684 0.01879 0.6176

# FIXME
>>> xr.combine_by_coords([x3, x1], join="override")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the repr of x1 and x2 reveals that both x and y are valid concat dimensions (given join="override") and it just chooses the first one (arbitrarily). I think this PR did not break it but just revealed a problem (also #4824). This should be fixed in another PR.

xarray/core/combine.py Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

Ready to go!

@TomNicholas
Copy link
Member

I think combine_by_coords is brittle in case of ambiguous concat dims as in the given example

@mathause if this is a new bug then can you make an issue for it please?

@max-sixty
Copy link
Collaborator Author

I think combine_by_coords is brittle in case of ambiguous concat dims as in the given example

@mathause if this is a new bug then can you make an issue for it please?

Isn't this #4824?

@max-sixty max-sixty added the plan to merge Final call for comments label May 15, 2021
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good way to work around not being able to sort keys of different types.

I worry about the dimension sizes which are part of the first line of the Dataset repr, though: we've had lots of questions about this because people thought that it would mean the same as in a DataArray's repr (the order there has a meaning). Arguably, that's because the difference between them is not big enough. This seems to have become less frequent since the introduction of the HTML reprs, so maybe it is also less important?

doc/whats-new.rst Outdated Show resolved Hide resolved
@max-sixty
Copy link
Collaborator Author

I worry about the dimension sizes which are part of the first line of the Dataset repr, though: we've had lots of questions about this because people thought that it would mean the same as in a DataArray's repr (the order there has a meaning). Arguably, that's because the difference between them is not big enough. This seems to have become less frequent since the introduction of the HTML reprs, so maybe it is also less important?

I agree the current situation isn't ideal (though I'm not sure there's a better one) but also that this doesn't make it much worse — the existing approach doesn't represent an useful ordering, and nor does this. And if we choose to make it mean something in the future, then this is a necessary step towards that...

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks!

I wonder if we should hold off on merging until after the 0.18.1 release -- or maybe just move to a date based versioning system / release cycle like dask.

@max-sixty max-sixty merged commit 5a602cd into pydata:master May 19, 2021
@max-sixty max-sixty deleted the sorted-keys branch May 19, 2021 19:30
@max-sixty max-sixty mentioned this pull request May 19, 2021
9 tasks
tomwhite added a commit to tomwhite/sgkit that referenced this pull request May 31, 2021
tomwhite added a commit to sgkit-dev/sgkit that referenced this pull request Jun 4, 2021
mergify bot pushed a commit to sgkit-dev/sgkit that referenced this pull request Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants