Loop vectorizer not working with LLVM 3.7? #13106

simonster · 2015-09-13T15:49:03Z

This may be a known issue, but I can't get anything to vectorize with LLVM 3.7, e.g. there are no vector instructions in:

function f(x)
    @simd for i = 1:length(x)
        @inbounds x[i] *= 2
    end
end
code_llvm(f, (Vector{Float64},))

Version is:

julia> versioninfo()
Julia Version 0.5.0-dev+63
Commit 4a2298d* (2015-09-12 22:48 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.4.0)
  CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.0

Of course this works properly with LLVM 3.3.

pao · 2015-09-14T17:52:53Z

cc @ArchRobison for @simd expertise

ArchRobison · 2015-09-14T18:12:32Z

I can replicate the problem. I'll take a look.

ArchRobison · 2015-09-14T19:57:40Z

Something is very wrong with the target machine identification. With JULIA_LLVM_ARGS=-debug-only=loop-vectorize, usr/bin/julia-debug seems to be underrating my "Haswell" box:

LV: The Widest register is: 32 bits.
LV: The target has no vector registers.

Though when I compile other code, I see 64-bit instructions being used. I'll poke around some more.

mdcfrancis · 2015-09-16T14:15:52Z

Possibly related ?

#13121

yuyichao · 2015-09-21T12:03:02Z

@ArchRobison Any update? I've tried to set JULIA_LLVM_ARGS=-debug-only=loop-vectorize but got,

yuyichao% JULIA_LLVM_ARGS=-debug-only=loop-vectorize julia-debug 
Julia: Unknown command line argument '-debug-only=loop-vectorize'.  Try: 'Julia -help'
Julia: Did you mean '-debug-pass=loop-vectorize'?

Is there any other compile options I need to set for this?

ArchRobison · 2015-09-21T15:09:28Z

I'm likely not going to be able to look at it further until next week, owing to a C++ committee deadline on Friday for proposals. So I encourage you to look into it.

Counter-intuitively, to get the "-debug-only" functionality, LLVM has to be built with assertions enabled. Add LLVM_ASSERTIONS = 1 to your Make.user, rebuild, and then JULIA_LLVM_ARGS=-debug-only=loop-vectorize should get you the extra output from usr/bin/julia-debug.

yuyichao · 2015-09-21T17:26:38Z

Thanks.

The issue does seem to be the register width since UInt8 can be successfully vectorized (using avx2 instructions ..... = = .....).

However, it doesn't seems to be just this. With -debug-only=subtarget it clearly shows that this CPU has AVX2

Features:+64bit,+sse2
CPU:broadwell

Subtarget features: SSELevel 9, 3DNowLevel 0, 64bit 1

I'll try to poke around but I'm not sure if I can find the issue.

ArchRobison · 2015-09-28T20:26:47Z

I'm back on this. Here's what I suspect is the proximate cause in codegen.cpp

#ifndef LLVM37
    jl_TargetMachine->addAnalysisPasses(*FPM);
#endif

Evidently addAnalysisPasses disappeared to make way for the latest fashion. After studying julia/deps/srccache/llvm-3.7.0/lib/CodeGen/LLVMTargetMachine.cpp, it looks like we need to call createTargetTransformInfoWrapperPass and possibly more, though I'm not sure yet.

… pass list.

Fix issue #13106 by adding TargetTransformInfoWrapperPass to pass list.

… pass list.

simonster added the compiler:codegen Generation of LLVM IR and native code label Sep 13, 2015

vtjnash mentioned this issue Sep 13, 2015

activate LLVM37 #9336

Closed

19 tasks

simonster added the performance Must go faster label Sep 13, 2015

simonster mentioned this issue Sep 14, 2015

Abstraction penalty: Wrapper types lead to much less efficient code #13104

Closed

ArchRobison pushed a commit to ArchRobison/julia that referenced this issue Sep 28, 2015

Fix issue JuliaLang#13106 by adding TargetTransformInfoWrapperPass to…

8580e9d

… pass list.

ArchRobison mentioned this issue Sep 28, 2015

Fix issue #13106 by adding TargetTransformInfoWrapperPass to pass list. #13349

Merged

vtjnash added a commit that referenced this issue Sep 29, 2015

Merge pull request #13349 from ArchRobison/adr/llvm37-vec

bc1a8f5

Fix issue #13106 by adding TargetTransformInfoWrapperPass to pass list.

ArchRobison closed this as completed Sep 29, 2015

eschnett mentioned this issue Oct 6, 2015

POCL NBody speed regression pocl/pocl#251

Closed

skumagai pushed a commit to skumagai/julia that referenced this issue Oct 9, 2015

Fix issue JuliaLang#13106 by adding TargetTransformInfoWrapperPass to…

86a120c

… pass list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop vectorizer not working with LLVM 3.7? #13106

Loop vectorizer not working with LLVM 3.7? #13106

simonster commented Sep 13, 2015

pao commented Sep 14, 2015

ArchRobison commented Sep 14, 2015

ArchRobison commented Sep 14, 2015

mdcfrancis commented Sep 16, 2015

yuyichao commented Sep 21, 2015

ArchRobison commented Sep 21, 2015

yuyichao commented Sep 21, 2015

ArchRobison commented Sep 28, 2015

Loop vectorizer not working with LLVM 3.7? #13106

Loop vectorizer not working with LLVM 3.7? #13106

Comments

simonster commented Sep 13, 2015

pao commented Sep 14, 2015

ArchRobison commented Sep 14, 2015

ArchRobison commented Sep 14, 2015

mdcfrancis commented Sep 16, 2015

yuyichao commented Sep 21, 2015

ArchRobison commented Sep 21, 2015

yuyichao commented Sep 21, 2015

ArchRobison commented Sep 28, 2015