-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stronger chain detection in LoopCarry pass #8016
Conversation
Also, fixed a bug when indices with different types are compared. BTW, as far as I know, this pass is only used in Hexagon and Xtensa backends. |
All tests are green now. |
See the comment up at line 250. It's not safe to use can_prove on a boolean Expr after doing substitute_in_all_lets. To make it safe to call, you have to call common_subexpression_elimination on the Expr first. Note that this gets called on every pair of indices, so it has quadratic complexity in the IR size. I worry that this will stall for very large unrolled stencils. It's worth writing a test of a very large case. If it does indeed stall, we might need a better algorithm. One could for example hash the expressions and look for hash collisions, where by "hash" I mean substitute in some arbitrary values for the variables and constant-fold, and then only do can_prove on exprs that have the same hash. |
Thanks a lot, this is very helpful! I changed it to apply CSE first and only then run can_prove. Also, added a test which triggers loop_carry on the loop with large number of indices and the compilation time seems to be fine. |
I don't think test failures are related. |
* Stronger chain detection in LoopCarry * Make sure that types are the same * Add a comment * Run CSE before calling can_prove * Test for loop carry * clang-tidy * Add missing override * Update comments
can_prove
is stronger thangraph_equal
, because it doesn't require index expressions to be exactly the same, but evalutate to the same value. I kept thegraph_equal
check, because it's faster and should be executed before the more expensive check.In one of the internal workloads, I see that with this change, what was previously split into three different chains of 4-, 2-, 3- values, is correctly combined into one long chain of lenght 9-.