Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: feat: use shape touched information in necessary columns optimization #368

Merged

Conversation

douglasdavis
Copy link
Collaborator

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Sep 18, 2023

Codecov Report

Merging #368 (8ad6643) into main (e8403c5) will increase coverage by 0.07%.
The diff coverage is 91.66%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #368      +/-   ##
==========================================
+ Coverage   93.81%   93.89%   +0.07%     
==========================================
  Files          23       23              
  Lines        2925     2930       +5     
==========================================
+ Hits         2744     2751       +7     
+ Misses        181      179       -2     
Files Changed Coverage Δ
src/dask_awkward/lib/io/json.py 87.93% <ø> (ø)
src/dask_awkward/lib/io/parquet.py 92.39% <ø> (ø)
src/dask_awkward/lib/optimize.py 95.88% <90.90%> (+0.08%) ⬆️
src/dask_awkward/layers/layers.py 94.79% <100.00%> (+2.08%) ⬆️
src/dask_awkward/lib/inspect.py 93.54% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Sep 18, 2023

@agoose77 so far this is just the start of some code to be able to pass necessary_shape_columns around the dask-awkward code base inline with how we pass around necessary_columns. I think the major part of getting this to work will be in the __call__ methods; Once we've created an array with the necessary data columns, we can then (perhaps in unproject_layout, because we call unproject_layout in the __call__ methods, or in something preceeding/following unproject_layout) create PlaceholderArrays with known length for the columns that are necessary shape columns; we'll know which one's they are by tracking it as an attribute of the callable class, which is possible because AwkwardInputLayer.project_columns will get that information from the optimization pass

Does this sound reasonable?

layers[name] = layers[name].project_columns(neccols)
layers[name] = layers[name].project_columns(
neccols,
necessary_shape_columns=None,
Copy link
Collaborator Author

@douglasdavis douglasdavis Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this None can be replaced by TypeTracerReport.shape_touched

@douglasdavis
Copy link
Collaborator Author

douglasdavis commented Sep 18, 2023

cc @lgray; if my comment above is the route we take then there will be some required changes in uproot (I'd be happy to help/do it- it should be simple- the function classes like upoot.dask.Uproot{OpenAnd}Read (if I'm remember the naming correctly) will need their project_columns implementations updated)

@lgray
Copy link
Collaborator

lgray commented Sep 18, 2023

that shouldn't be a big issue!

@douglasdavis douglasdavis force-pushed the optimization-track-shape-touched branch from 9e7b806 to 8ad6643 Compare September 19, 2023 16:41
@agoose77 agoose77 merged commit 8ad6643 into dask-contrib:main Oct 6, 2023
18 checks passed
@douglasdavis douglasdavis deleted the optimization-track-shape-touched branch January 26, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants