local laplacian performance regression #7917

abadams · 2023-10-25T20:29:53Z

It was ~25ms about a year ago on my machine, and now it's 40ms. Will investigate now, but opening an issue to track in case I need to context switch.

abadams · 2023-10-25T21:01:35Z

I have found a Halide commit about a year ago (102c059) where it's fast with llvm 14 and slow with llvm 16. So it's an llvm change, but an old one. The salient difference seems to the use of vgatherdps instructions, which are causing major stalls.

If I target avx2 instead of (avx-512) it's fast again, and uses vinsertps instead of vpgatherdps. This is on skylake-x.

abadams · 2023-10-25T21:19:56Z

LLVM's SLP vectorizer is pattern matching the gather op. If I turn it off, the problem goes away.

abadams · 2023-10-25T21:52:07Z

Bug opened on llvm: llvm/llvm-project#70259

In the meantime we may need to turn off SLP vectorization for x86. Vector gathers are not uncommon in Halide code, as they show up in boundary conditions and luts.

Fixes #7917

Fixes halide#7917

abadams added the performance label Oct 25, 2023

abadams self-assigned this Oct 25, 2023

abadams added a commit that referenced this issue Oct 25, 2023

Turn off SLP vectorization for avx512 only

616286d

Fixes #7917

abadams mentioned this issue Oct 25, 2023

Turn off SLP vectorization for avx512 only #7918

Merged

abadams closed this as completed in #7918 Oct 27, 2023

abadams added a commit that referenced this issue Oct 27, 2023

Turn off SLP vectorization for avx512 only (#7918)

cf01e97

Fixes #7917

abadams mentioned this issue Nov 11, 2023

More targeted fix for gather instructions being slow on intel processors #7945

Merged

ardier pushed a commit to ardier/Halide-mutation that referenced this issue Mar 3, 2024

Turn off SLP vectorization for avx512 only (halide#7918)

6ba7ec3

Fixes halide#7917

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local laplacian performance regression #7917

local laplacian performance regression #7917

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023

local laplacian performance regression #7917

local laplacian performance regression #7917

Comments

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023

abadams commented Oct 25, 2023