Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Factor out a function for checking dimension-related errors #8089

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgunyho
Copy link
Contributor

@mgunyho mgunyho commented Aug 19, 2023

This is a WIP follow-up for #8079 and I think also for #7051. The pattern

missing_dims = set(dims) - set(self.dims)
if missing_dims:
    raise ValueError(f"Dimensions {missing_dims} not found in data dimensions {tuple(self.dims)}")

occurs in many methods, with small variations in the way missing_dims is calculated, the error message, and also if it's ValueError or KeyError. So it would make sense to factor it out. But I'm not familiar enough with the context around #7051 to know how to deal with sets vs tuples, so this is just a sketch for now.

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@mgunyho mgunyho changed the title Factor out a function for checking dimension-related errors WIP: Factor out a function for checking dimension-related errors Aug 19, 2023
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks! Forgive the delay on the PR review.

Am I correct in thinking that check_dims isn't used that much? Is the idea to move many of these checks to use that function?

@@ -5440,10 +5435,10 @@ def unstack(
else:
dims = list(dim)

missing_dims = [d for d in dims if d not in self.dims]
missing_dims = set(dims) - set(self.dims)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test, and this is indeed faster. It's equivalent for very small lists — 3 in each — and then much faster as the list grows.

xarray/core/utils.py Outdated Show resolved Hide resolved
@mgunyho
Copy link
Contributor Author

mgunyho commented Sep 11, 2023

No worries! It might take me a while before I can finish this up, but I've rebased this now that #8079 is merged.

@mgunyho
Copy link
Contributor Author

mgunyho commented Sep 11, 2023

Is the idea to move many of these checks to use the check_dims function?

Yes. There is a possibility that it will turn out messy or unreadably generic because there are quite a few different variants of the pattern, even though they are similar.

@max-sixty
Copy link
Collaborator

Ah, I see — I think the diff I reviewed two days ago was almost exclusively the prior PR, because there was a conflict and so GH showed too big a diff.

That was a big benefit. My guess is that delegating the message construction is a smaller benefit, and maybe a bit too generic, as you suggest. But very open to it if we can find enough commonality...

@mgunyho
Copy link
Contributor Author

mgunyho commented Sep 12, 2023

Yes, this PR was just one commit on top of the previous one. I'll see if this turns out to be worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants