-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce cost of large variant matrix #5392
Conversation
when variant matrix is large and mostly unused (as in conda-forge), the length of input_variants may be several thousand when only a few are actually used. This causes `get_loop_vars` and `metadata.copy()` to become very expensive.
CodSpeed Performance ReportMerging #5392 will improve performances by ×2.6Comparing Summary
Benchmarks breakdown
|
need to investigate why the conda-build tests produce different results with actual runs of conda-build, all of which seem to produce the right variants. |
seems to be the exclusion of |
should reduce less
pre-commit.ci autofix |
This comment was marked as outdated.
This comment was marked as outdated.
vastly reduces the number of copies computed for large variant matrices
rather than computing all loop vars and then intersecting, only consider relevant keys when computing loop vars reduces get_used_loop_vars from O(n_vars * n_variants) to O(n_used_vars * n_variants)
config.copy already copies this, no need to do it twice in metadata.copy
@@ -2394,7 +2394,6 @@ def validate_features(self): | |||
def copy(self: Self) -> MetaData: | |||
new = copy.copy(self) | |||
new.config = self.config.copy() | |||
new.config.variant = copy.deepcopy(self.config.variant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.copy on the line before already does exactly this, no need to do it twice
used_vars = self.get_used_vars( | ||
force_top_level=force_top_level, force_global=force_global | ||
) | ||
return set(loop_vars).intersection(used_vars) | ||
return self.get_loop_vars(subset=used_vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_loop_vars is far cheaper if we pass a subset to consider instead of computing the (usually quite small) intersection after looping over all variables across all variants.
I've taken a different approach that doesn't modify the variants list at all, so shouldn't have any consequences besides performance. Instead of reducing the actual variants list, I've reduced the cost of the two dominant operations on the large variant list:
The first changes the variants copy from O(top level variants * input_variants) to O(top_level_variants * per_top_level_variants), which is the same as The second changes the The savings aren't quite what they are for actually reducing the variants list because there are still some operations on the full list but it still cuts render time in half. |
I'm afraid I don't understand the mac failures or how they could be related to this PR. The same tests pass just fine on my mac. Hopefully something transient? |
Those mac failures do look unrelated. Let's try to rerun before we dig into them. |
@mbargull Can you take a look at this one? I don't see any obvious problems but this work gets a bit into the guts of the code. |
Anyone have a chance to look at this? |
to avoid calling pickle in too many places
@isuruf @beeankha @kenodegard Can we merge this one for the 24.9 release? |
8a1f9ca
to
644baaf
Compare
Description
when variant matrix is large and mostly unused (as in conda-forge), the length of input_variants may be several thousand (13,824 in the case of petsc4py) when only a few are actually used.
This causes
get_loop_vars
andmetadata.copy()
to become very expensive and dominate render time.This reduction cuts time spent in
render_recipe
for petsc4py from over 2 minutes to 40 seconds to produce 72 actual variants:before:
after:
(result is unchanged)
Checklist - did you ...
news
directory (using the template) for the next release's release notes?