[Optimization] Cache contract generation and pre-compile some match expression #2013

yannham · 2024-07-26T16:25:04Z

This PR is guided by the metrics gathered for some run of Nickel on large codebases, which showed that some contracts were generated again and again, while coming from one initial annotation. While it's expected that an annotation might give raise to many runtime evaluations (a contract in the body of a function, typically), it's wasteful that we generate the code at runtime many times (contract generation is a form of compilation from a static type to a Nickel expression, and it shouldn't happen all the time during execution).

The issue is that we now store types directly in the AST and consider them weak head normal form. So, each time we apply such a contract, contract/apply get a typ::Type, and need to make a contract out of it: generation happens again and again. This PR adds a new field to Term::Type, contract: RichTerm, which is populated at parsing time with the contract generated from the type (which is close to the behavior before the introduction of Term::Type). Doing so, we should only generate one contract for each user annotation in the code.

Metrics indeed show that in some big examples, the worst case was generating ~1700 times the same contract, which is brought down to one. We also tried to closurize the generated contract, so that it doesn't have to be evaluated several time, but this showed no improvement whatsoever, while it increases the number of allocated thunks, so it's been reverted.

Additionally, in the same spirit of "generating more statically, and don't do it again at runtime", we precompile the match expression generated when converting an enum contract, so that it's not recompiled again and again each time the contract is applied. This showed a reduction of around 25 to 30% of the total number of compilation of match expressions on the private benchmark.

The concrete time again aren't spectacular, as it's saving mostly Rust function calls, and not Nickel, the latter being of course the dominant cost, but there is still a small improvement.

After the introduction of a proper node for types in the AST, we switched to a lazy way of generating contract: we keep the type as it is, and convert it to a contract only once it's actually applied. While simpler, this approach has the drawback of potentially wasting computations by running the contract generation code many times for the same contract, as some metrics showed (up to 1.6 thousand times on large codebases). This commit add a `contract` field to `Term::Type`, and statically generates - in the parser - the contract corresponding to a type. This is what we used to do before the introduction of a type node. Doing so, we only generate at most one contract per user annotation, regardless of how many times the contract is applied at runtime.

This commit saves some re-compilation of the match expression generated by contracts converted from enum types by generating the compiled version directly instead of a match.

aspiwack · 2024-07-29T07:05:07Z

What are the estimate performance gains from this PR?

yannham · 2024-07-29T09:49:12Z

What are the estimate performance gains from this PR?

As a disclaimer, the improvement is hard to measure properly with the micro benchmarks (aka it doesn't show), in particular because we don't really benchmark using the same contract again and again (which does happen in real life as measurements show, probably because of merging, array contracts, etc.), and I feel a micro benchmark on that would be rather artificial. The only example where it can show measurable effects is a 120kLoC real code base, which takes around 20sec to evaluate, which starts to be long running enough to see some differences.

I see a reduction of around 2% on several runs in user time (but system time, which is the predominant cost, is varying quite a bit, making the final result potentially less, at least on my very volatile laptop). Could be interesting to ask the original users, which are moreover using OsX, if they see something different in their setup.

What motivated this PR originally is:

a few commits earlier, master was atrociously slower than latest stable release on said example. The contract system rework was of course an important factor, but adding a type node to the AST is another thing we implemented after 1.7. In some sense, this PR is trying to do something closer to how it worked in 1.7, with the idea of closing the gap.
This PR only decreases strictly all the stuff that we measure at the level of the interpreter (thunks allocation, environment creation/size/cloning, all primop application, contract generation, etc.), so at the macro level, it should be a strict improvement. We're not changing the AST layout (that is, the size and alignement of the Term structure), which is already a big enum with larger variants. Of course it doesn't preclude memory layout effects that could indeed negate any potential benefit of this change, but I suppose it still limits the possibility.

I intend to do a larger rework of the AST after the next (imminent) release of 1.8 anyway, so this change might not be very impactful on the long term. It's more of a "back to 1.7 perf" band-aid. However it's not hugely impactful even now, so I'm fine ditching it if others think it's not worth it.

yannham · 2024-08-02T06:36:16Z

I'm going to merge this for 1.8, if that's fine by everyone. It's not fundamentally very difficult to revert, and I plan to rework quite a bit the AST and introduce multiple IRs for 1.9 anyway, so this change might become obsolete.

github-actions bot temporarily deployed to pull request July 26, 2024 16:28 Inactive

Pre-compile match in enum contract

9191bd6

This commit saves some re-compilation of the match expression generated by contracts converted from enum types by generating the compiled version directly instead of a match.

yannham force-pushed the perf/cache-contract-gen branch from 9345420 to 9191bd6 Compare July 26, 2024 16:28

yannham requested a review from jneem July 26, 2024 16:28

github-actions bot temporarily deployed to pull request July 26, 2024 16:31 Inactive

jneem approved these changes Jul 29, 2024

View reviewed changes

Fix unused import warnings

af4bb7f

github-actions bot temporarily deployed to pull request July 29, 2024 10:32 Inactive

yannham added this pull request to the merge queue Aug 2, 2024

Merged via the queue into master with commit b3c8e30 Aug 2, 2024
5 checks passed

yannham deleted the perf/cache-contract-gen branch August 2, 2024 06:55

BrewTestBot mentioned this pull request Sep 9, 2024

nickel 1.8.0 Homebrew/homebrew-core#184055

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Cache contract generation and pre-compile some match expression #2013

[Optimization] Cache contract generation and pre-compile some match expression #2013

yannham commented Jul 26, 2024

aspiwack commented Jul 29, 2024

yannham commented Jul 29, 2024

yannham commented Aug 2, 2024 •

edited

Loading

[Optimization] Cache contract generation and pre-compile some match expression #2013

[Optimization] Cache contract generation and pre-compile some match expression #2013

Conversation

yannham commented Jul 26, 2024

aspiwack commented Jul 29, 2024

yannham commented Jul 29, 2024

yannham commented Aug 2, 2024 • edited Loading

yannham commented Aug 2, 2024 •

edited

Loading