-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: raise error to eager len
calls
#416
feat!: raise error to eager len
calls
#416
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #416 +/- ##
==========================================
- Coverage 93.94% 93.94% -0.01%
==========================================
Files 23 23
Lines 3123 3122 -1
==========================================
- Hits 2934 2933 -1
Misses 189 189 ☔ View full report in Codecov by Sentry. |
I'd be tempted to make this an error. Implicit computation is something we generally want to avoid, and whilst we have a config option for this, I wonder if the default should change. |
@agoose77 It is definitely an option. I'm wondering however whether it will ruin pytests or something where |
I think so, too. Warnings in Python are too easy to lose, so they don't have much benefit beyond deprecations (because pytest can be configured to always show them and treat them as errors, not warnings after all). You don't want to be getting the |
That is, it should be an error. |
If someone wants an eager collection, there should be only one way to do it: by calling |
Alrighty. If @douglasdavis agrees, I'm gonna change this into an error and then go through the |
I agree with Jim and Angus that this should raise for unknown partition sizes: something along the lines of: if not self.known_divisions:
raise ValueError("cannot determine length of collection with unknown partitions sizes") I was scratching my head trying to figure out why this didn't raise to begin with and I honestly can't figure that out. I even added the patch to dask/dask 2 years ago to raise on calling |
Having the exception get raised for unknown divisions sizes may surface some places where should be able to define divisions at collection instantiation time |
"... without executing the graph" (or something like that) |
Alrighty, I'll make it happen and also add another sentence that directs the user to use |
Yea you may surface a test or two in dask-awkward that will need updating, just changing the expectation to the raise is probably all that needs doing on those. If it pops up, something along the lines of: assert not array.known_divisions
with pytest.raises(ValueError, match="cannot determine length of collection"):
len(array) I'm not sure about coffea :) |
Yup. There shouldn't be any such |
len
callslen
calls
I've updated the code to raise an error and also the 3 tests that were failing. I don't understand the mypy pre-commit.ci error though. I'll check the code source code and also coffea tomorrow in case I'm missing something. Let's not merge yet. |
@lgray There a few eager |
Not that I recall - I can give a search later (or you can since you're on this particular thread). |
len
callslen
calls
I've just went over control-Fing the src code in my IDE and all the |
Great- thanks, @iasonkrom! |
I've seen new users not immediately understanding that
len
calls are eager when divisions are unknown and wondering why their analysis code takes too long before finding out it was just a couple oflen
calls that shouldn't be there.I was a victim of this myself. Therefore, I think a good idea would be to add a warning.