-
Notifications
You must be signed in to change notification settings - Fork 478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Strict in Plutus Tx #5371
Use Strict in Plutus Tx #5371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for review. This led to cost reduction in general, and in some cases the savings are extraordinary (e.g., from 7... to 2...). There are also some cost increases but they are much milder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a while looking through the PIR and spotted some oddities, mostly things we want to be sure that optimization is clearing up later.
There are a few things that seem actually wrong, but mostly this looks fine.
(c | ||
True | ||
let | ||
!ls : List Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now this is strict and not obviously pure, so we can't inline it, alas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not obviously pure
How come? Looks obviously pure to me: a lambda behind a type let.
And even if it wasn't pure, it's evaluated right after the let
process, so we could still inline it. I guess teaching that the inliner might be non-trivial though. Particuarlly because the "right after" part may get changed in the course of optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little hard to read, but it's actually an application of the lambda inside the let. Now, if we inlined the type binding, we could then reduce the application and we would be in a better place...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my biggest issue with reviewing this PR is that I don't know what to expect when I'm reviewing changes to a golden file. It would be great to have dump a comment in each golden file specifying which optimizations were performed on the code in the file, because otherwise I have no idea if something isn't, say, inlined because of an issue or because the inliner wasn't run on the program in the first place.
120 1402 (8.6%) 8201088120 (82.0%) 510702 (3.6%) | ||
130 1512 (9.2%) 8884555240 (88.8%) 553062 (4.0%) | ||
140 1622 (9.9%) 9568022360 (95.7%) 595422 (4.3%) | ||
150 1732 (10.6%) 10251489480 (102.5%) 637782 (4.6%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything's slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything's slower?
Not everything, but things are a bit less slower now thanks to -fno-full-laziness
!
120 1402 (8.6%) 24946946570 (249.5%) 515004 (3.7%) | ||
130 1512 (9.2%) 27026011310 (270.3%) 557724 (4.0%) | ||
140 1622 (9.9%) 29105076050 (291.1%) 600444 (4.3%) | ||
150 1732 (10.6%) 31184140790 (311.8%) 643164 (4.6%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
120 12434 (75.9%) 4374611330 (43.7%) 490704 (3.5%) | ||
130 13464 (82.2%) 4739320550 (47.4%) 531424 (3.8%) | ||
140 14494 (88.5%) 5104029770 (51.0%) 572144 (4.1%) | ||
150 15524 (94.8%) 5468738990 (54.7%) 612864 (4.4%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And everywhere so far.
120 21265 (129.8%) 8951320962 (89.5%) 6204546 (44.3%) | ||
130 22992 (140.3%) 9696801862 (97.0%) 6719976 (48.0%) | ||
140 24719 (150.9%) 10442282762 (104.4%) 7235406 (51.7%) | ||
150 26446 (161.4%) 11187763662 (111.9%) 7750836 (55.4%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still.
({cpu: 181259654 | ||
| mem: 594532}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still. And this is quite worse! 7.6% worse to be precise, that's quite a lot, imagine an "optimization" of the CEK machine making things so much worse.
@@ -6,6 +6,14 @@ in | |||
let | |||
data (AdditiveMonoid :: * -> *) a | AdditiveMonoid_match where | |||
CConsAdditiveMonoid : (\a -> a -> a -> a) a -> a -> AdditiveMonoid a | |||
!ls : List integer | |||
= (let | |||
a = List integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could make the optimizer look look through type lets. I do agree it's worth inlining this type though.
force | ||
(force | ||
(foldr | ||
(delay | ||
(\x -> | ||
x)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was puzzled seeing foldr id z fs
, but apparently that is the same as foldr (.) id fs z
. All those times I wrote the latter and not the former...
plutus-tx-plugin/test/Plugin/Data/families/associatedParam.pir.golden
Outdated
Show resolved
Hide resolved
Many tests cases became more expensive with b267848, which stopped exposing the unfolding of Overall, there are both increased and decreased costs, but the increase is always relatively mild, while the decrease is sometimes extraordinary. This makes sense: by making things more strict, you sometimes evaluate something once, which does not need to be evaluated. On the other hand, by making things less strict, you sometimes evaluate something many times, which only needs to be evaluated once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that another reason for the higher cost when using Strict
comes from the fact that strict bindings are less likely to be inlined, because many of them are not considered "effect safe".
When something is not inlined, it incurs the additional machine costs of VDelay, VLamAbs and VApply, even if it doesn't change the number of times it is evaluated. But again, this is relatively mild compared to what Strict
can potentially save us.
Let me elaborate on my argument for why lazy patterns on function arguments shouldn't be able to do anything (and hence why we need to figure out WTF is actually happening). AIUI, a bang pattern on a function argument means we get something like this (using some name shadowing because that's what we end up with also):
Presumably with
i.e. the same thing. It cannot matter whether we inline So it seems to me that lazy function arguments cannot do anything... except possibly make GHC do something different. This seems pretty important to me - we need to understand what the effect of these changes are in general. |
Ok, now I remember that in Plutus Tx, |
I don't think that matters here? The only let bindings were in the PIR-psuedocode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and responded to some comments.
@michaelpj @effectfully One of the main mysteries solved. Previously unless I did all of these three things: link1, link2, link3, I got the strange "no unfolding" error for The fix is simple - adding any one of the following to Next up: figure out why some things don't work in GHC 9.6 🤷♂️ |
Added
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've addressed all the concerns now (except doing profiling).
By the way, for some of the cost increases, their costs should have been higher before. Consider Nothing <*> _ = Nothing
. The _
should always have been evaluated strictly, but, just like the ||
and &&
situation, when we didn't have Strict
, GHC may unconditionally inline <*>
, thereby avoiding evaluating _
. This may very well be responsible for most of the cost increases, since it leads to lots of strict, dead bindings, e.g., #5371 (comment).
We can potentially do the same thing to <*>
as what we did to ||
and &&
, but since there is no dramatic cost increases (and there shouldn't be), I'd rather leave this PR as it is and do that separately.
(nonrec) | ||
(termbind | ||
(strict) | ||
(vardecl ds [ Maybe b ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is from Nothing <*> _ = Nothing
, and it is the same situation as ||
and &&
. The second argument should always be strictly evaluated, but when we didn't have Strict
, GHC may unconditionally inline <*>
which avoided evaluating the second argument.
This can be quite bad, and may be a main contributor of some of the increased budgets 😐
[y, z, zz, x, t] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added:
-- TODO: the Cek log of this test case is currently incorrect as it doesn't preserve
-- the order of logging, which it should do given theconservative-optimisation
flag.
-- Both GHC and ourselves are culprits in this instance, see
-- #5371 (comment)
, all | ||
, mapThese | ||
) where | ||
Map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of the ad-hoc reformatting. It makes it much harder to review diffs. If we're going to switch to using a formatter globally, then let's agree that and apply it one go. (and if we haven't agreed and "reformatting in PRs" is acceptable, then there is nothing stopping someone just blindly formatting it back later!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to trigger CI. I should have used a different module.
We've agreed on using 2-space indentation, and our style guide has been updated, so the reformatting I did makes it conform to the style guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot that we agreed to that :| can we revisit just agreeing to fourmolu everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to agree to fourmolu. Fourmolu is one way to convert 4-space to 2-space. It's not a problem for me if someone prefers hand formatting the code.
I'm not aware of any other tool that people use to auto-format code, so it's unlikely that someone will blindly re-format it in a very different format.
plutus-tx/src/PlutusTx/Bool.hs
Outdated
This is because function applications in Plutus Tx are strict, so when passing an argument | ||
to a function, the argument will be evaluated immediately, regardless of whether the | ||
corresponding function parameter is lazy or strict. Specifically, both `f !x = body` and | ||
`f ~x = body` in Plutus Tx are compiled into `let f = \x -> body` in PIR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is wrong. They are compiled differently.
f ~x = body
-->
f = \x -> body
f !x = body
-->
f = \x -> let !x = x in body
That's why there is a difference if GHC inlines them, otherwise there would be literally no difference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But aren't these two equivalent? And the second term becomes the first term after inlining. That's the point of this paragraph - it should (although it sometimes doesn't because of GHC inlining) produce the same PIR eventually, whether function parameters are strict or lazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they do become the same thing eventually, but the relevant thing is what they become initially, since that's what lets us observe the difference!
Okay, maybe let me phrase this differently.
f ~x = body
---> (GHC Core)
f x = body
---> (PIR)
f = \x -> body
f !x = body
---> (GHC Core)
f x = case x@x of { _ -> body }
---> (PIR)
f = \x -> let x = x in body
The two PIR bindings have the same semantics. But if GHC inlines the function at the core stage, then we get the body of the second PIR binding at the call site, which does behave differently.
(I think we both agree with what's going on, I just think the explanation isn't quite clear for future people)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added the steps you laid out above in the Note.
Consider the Plutus Tx code `let f !x = t in f arg`. Here `arg` will be immediately when | ||
being passed to `f` (as it should). But now consider `let f ~x = t in f arg`. Here GHC | ||
may unconditionally inline `f`, leading to `t [x := arg]` (recognize that the GHC inliner | ||
also performs beta reduction, in addition to replacing `f` with its definition), where `arg` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary for GHC to perform beta-reduction, although perhaps it does, because of the difference in how lazy/strict patterns get compiled above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is! I keep forgetting this! Maybe needs a more worked out example since apparently it's not obvious lol
plutus-tx/src/PlutusTx/Bool.hs
Outdated
Therefore, lazy patterns should generally be avoided, since its semantics depend on | ||
what the GHC inliner does. That said, we can use it in some cases to our advantage, | ||
such as in `(&&)` and `(||)`. These logical operators would be much less useful if | ||
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve | |
they can't short-circuit. By using a lazy pattern on the second parameter and forcing GHC to inline, we can achieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this. The INLINE pragma does not force GHC to inline. We run the GHC simplifier with sm_inline
turned off. It does force the PIR inliner to inline, but the PIR inliner only replaces the variable with the definition, which isn't sufficient to make it non-strict. We should make it so that the INLINE pragma forces the PIR simplifier to beta-reduce. We had a length discussion in the ADR PR and IIRC this was the conclusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right yes, we need to both force inlining and beta reduction. I keep forgetting that!
We had a length discussion in the ADR PR and IIRC this was the conclusion.
I do not think we agreed to that :p #5270 (comment)
I'm overall then confused by this line: since we don't have such behaviour at the moment, we can't achieve what's written here? So the note is talking about some hypothetical future behaviour. I think we should be clear that it's not something we can do now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I rephrase it in an attempt to make it more clear:
Using a lazy pattern on the second parameter helps achieve
the desirable effect, namely, the second parameter is only evaluated if needed, although
this is currently not guaranteed, since(&&)
and(||)
are not necessarily
unconditionally inlined by GHC. To guarantee that the second parameter is only evaluated
when needed, we can potentially use a particular source annotation to tell the PIR
inliner to inline and beta-reduce a function, which would achieve the same effect as if
it is unconditionally inlined by GHC. This is yet to be implemented.
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion re: -fno-full-laziness
. I added a Note [-fno-full-laziness in Plutus Tx].
This should now be mergeable.
120 1402 (8.6%) 8201088120 (82.0%) 510702 (3.6%) | ||
130 1512 (9.2%) 8884555240 (88.8%) 553062 (4.0%) | ||
140 1622 (9.9%) 9568022360 (95.7%) 595422 (4.3%) | ||
150 1732 (10.6%) 10251489480 (102.5%) 637782 (4.6%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything's slower?
Not everything, but things are a bit less slower now thanks to -fno-full-laziness
!
, all | ||
, mapThese | ||
) where | ||
Map, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to agree to fourmolu. Fourmolu is one way to convert 4-space to 2-space. It's not a problem for me if someone prefers hand formatting the code.
I'm not aware of any other tool that people use to auto-format code, so it's unlikely that someone will blindly re-format it in a very different format.
plutus-tx/src/PlutusTx/Bool.hs
Outdated
Therefore, lazy patterns should generally be avoided, since its semantics depend on | ||
what the GHC inliner does. That said, we can use it in some cases to our advantage, | ||
such as in `(&&)` and `(||)`. These logical operators would be much less useful if | ||
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I rephrase it in an attempt to make it more clear:
Using a lazy pattern on the second parameter helps achieve
the desirable effect, namely, the second parameter is only evaluated if needed, although
this is currently not guaranteed, since(&&)
and(||)
are not necessarily
unconditionally inlined by GHC. To guarantee that the second parameter is only evaluated
when needed, we can potentially use a particular source annotation to tell the PIR
inliner to inline and beta-reduce a function, which would achieve the same effect as if
it is unconditionally inlined by GHC. This is yet to be implemented.
plutus-tx/src/PlutusTx/Bool.hs
Outdated
This is because function applications in Plutus Tx are strict, so when passing an argument | ||
to a function, the argument will be evaluated immediately, regardless of whether the | ||
corresponding function parameter is lazy or strict. Specifically, both `f !x = body` and | ||
`f ~x = body` in Plutus Tx are compiled into `let f = \x -> body` in PIR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added the steps you laid out above in the Note.
No description provided.