Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local laplacian performance regression #7917

Closed
abadams opened this issue Oct 25, 2023 · 3 comments · Fixed by #7918 or #7945
Closed

local laplacian performance regression #7917

abadams opened this issue Oct 25, 2023 · 3 comments · Fixed by #7918 or #7945
Assignees

Comments

@abadams
Copy link
Member

abadams commented Oct 25, 2023

It was ~25ms about a year ago on my machine, and now it's 40ms. Will investigate now, but opening an issue to track in case I need to context switch.

@abadams abadams self-assigned this Oct 25, 2023
@abadams
Copy link
Member Author

abadams commented Oct 25, 2023

I have found a Halide commit about a year ago (102c059) where it's fast with llvm 14 and slow with llvm 16. So it's an llvm change, but an old one. The salient difference seems to the use of vgatherdps instructions, which are causing major stalls.

If I target avx2 instead of (avx-512) it's fast again, and uses vinsertps instead of vpgatherdps. This is on skylake-x.

@abadams
Copy link
Member Author

abadams commented Oct 25, 2023

LLVM's SLP vectorizer is pattern matching the gather op. If I turn it off, the problem goes away.

@abadams
Copy link
Member Author

abadams commented Oct 25, 2023

Bug opened on llvm: llvm/llvm-project#70259

In the meantime we may need to turn off SLP vectorization for x86. Vector gathers are not uncommon in Halide code, as they show up in boundary conditions and luts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant