-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strength reduction for add operations performed power of 2 times #34938
Comments
I guess it's not arm-specific (could generate a single |
Yep, I agree. I have seen x86 too emitting |
@kunalspathak I'm interested in working on this issue and could use a little design advice if you don't mind. My strategy to this peephole optimization is to reduce chains of add operations into a single mul operation, and then let the optimizer transform the mul operation into a more efficient one, as is done at: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L12426 So essentially, I can do something like
at which point the subsequent (https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/morph.cpp#L12426) optimization will transform I wanted to fold this in during the postorder processing, but the multiplication to shift optimization is conflicts with the strength reduction, i.e., I see three solutions, and I'm curious how they might fit in to the design and looking for some feedback:
|
@kunalspathak I'm interested in working on this issue Thank you for volunteering.
You mean, doing it using |
Yes, that's what I meant --- use One thing though, is don't we want to handle cases like |
We need to see what we generate for those code patterns. This issue is for cases like |
Why before each operand? I would have expected it to live in the pre-order processing of This is a tradeoff in complexity vs completeness. Pre-order will miss some things, but the code will be simpler. The way to evaluate this is to make both and check the diffs (which will also tell us how valuable this optimization is in general in real-world(ish) code). |
Understood. Will work on it as such.
I think we are thinking the same thing, I meant to say in the pre-order processing of Thanks for the feedback folks, will work on it. |
I do see that we generate expected output. G_M44653_IG02: ;; offset=0008H
531E7400 lsl w0, w0, #2 G_M44653_IG02: ;; offset=0000H
8D048D00000000 lea eax, [4*rcx] |
Above code should obviously be
return 4 * i
, but today if we see something like this, we don't optimize it that way and performadd
operations. We could optimize it tolsl
if adds are performed in power of 2.Today we generate:
We should generate:
We never optimize even for 8 or 16 operations and thus emitting series of
add
operations.category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: