Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for Delayed in partitionwise layer creation. #449

Conversation

douglasdavis
Copy link
Collaborator

Broadcasting will occur equivalently at each partition because the Delayed object is not partitioned (as shown in the test added in this PR).

Broadcasting will equivalently at each partition because the Delayed
object is *not* partitioned.
@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (df0ef57) 93.09% compared to head (5c0b00e) 93.10%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/dask_awkward/lib/core.py 87.50% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #449      +/-   ##
==========================================
+ Coverage   93.09%   93.10%   +0.01%     
==========================================
  Files          23       23              
  Lines        3272     3279       +7     
==========================================
+ Hits         3046     3053       +7     
  Misses        226      226              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lgray
Copy link
Collaborator

lgray commented Jan 12, 2024

Cool, I'll give it a try in the morning!

@lgray
Copy link
Collaborator

lgray commented Jan 12, 2024

I'm getting an odd error when I try to print out the dask awkward array that's made from something using delayed:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lgray/coffea-dev/coffea/tests/test_lookup_tools.py", line 177, in test_correctionlib
    print(test_out_dak)
  File "/Users/lgray/coffea-dev/dask-awkward/src/dask_awkward/lib/core.py", line 998, in __str__
    f"npartitions={self.npartitions}"
  File "/Users/lgray/coffea-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1083, in npartitions
    return len(self.divisions) - 1
  File "/Users/lgray/miniforge3/envs/ewkcoffea-debug/lib/python3.10/site-packages/dask/delayed.py", line 635, in __len__
    raise TypeError("Delayed objects of unspecified length have no len()")
TypeError: Delayed objects of unspecified length have no len()

I haven't traced it to understand why the output dak array things it's a dask.delayed?

@agoose77
Copy link
Collaborator

It's probably that something's constructed Array with the divisions argument taken from delayed.divisions, i.e. in a previous frame.

@douglasdavis
Copy link
Collaborator Author

Is it possible to get a repro for that TypeError?

divisions should always be a tuple (specifically tuple[int, ...] | tuple[None, ...]) on a dak.Array

@lgray
Copy link
Collaborator

lgray commented Jan 12, 2024

I'll see if I can get one - I'm trying to make sure first I'm not bamboozling myself.

@lgray
Copy link
Collaborator

lgray commented Jan 12, 2024

somehow array.divisions has, itself, become a delayed object!?

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Jan 12, 2024

Just to jot this information down in a more permanent place: we discovered during slack conversation that if a Delayed object was the first thing passed to map_partitions, we broke an assumption in the code that the first argument would be a dak.Array. The map_partitions implementation is now invariant w.r.t. order of function arguments, but at least one dak.Array is required.

@martindurant
Copy link
Collaborator

we broke an assumption in the code that the first argument would be a dak.Array

easy to fix, right?

@douglasdavis
Copy link
Collaborator Author

Yes! fixed in 9c6a65b

@douglasdavis douglasdavis merged commit 01777b6 into dask-contrib:main Jan 12, 2024
24 checks passed
@douglasdavis douglasdavis deleted the support-delayed-use-in-partitionwise branch January 12, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants