Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop vectorizer not working with LLVM 3.7? #13106

Closed
simonster opened this issue Sep 13, 2015 · 8 comments
Closed

Loop vectorizer not working with LLVM 3.7? #13106

simonster opened this issue Sep 13, 2015 · 8 comments
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster

Comments

@simonster
Copy link
Member

This may be a known issue, but I can't get anything to vectorize with LLVM 3.7, e.g. there are no vector instructions in:

function f(x)
    @simd for i = 1:length(x)
        @inbounds x[i] *= 2
    end
end
code_llvm(f, (Vector{Float64},))

Version is:

julia> versioninfo()
Julia Version 0.5.0-dev+63
Commit 4a2298d* (2015-09-12 22:48 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.4.0)
  CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.0

Of course this works properly with LLVM 3.3.

@simonster simonster added the compiler:codegen Generation of LLVM IR and native code label Sep 13, 2015
@vtjnash vtjnash mentioned this issue Sep 13, 2015
19 tasks
@simonster simonster added the performance Must go faster label Sep 13, 2015
@pao
Copy link
Member

pao commented Sep 14, 2015

cc @ArchRobison for @simd expertise

@ArchRobison
Copy link
Contributor

I can replicate the problem. I'll take a look.

@ArchRobison
Copy link
Contributor

Something is very wrong with the target machine identification. With JULIA_LLVM_ARGS=-debug-only=loop-vectorize, usr/bin/julia-debug seems to be underrating my "Haswell" box:

LV: The Widest register is: 32 bits.
LV: The target has no vector registers.

Though when I compile other code, I see 64-bit instructions being used. I'll poke around some more.

@mdcfrancis
Copy link

Possibly related ?

#13121

@yuyichao
Copy link
Contributor

@ArchRobison Any update? I've tried to set JULIA_LLVM_ARGS=-debug-only=loop-vectorize but got,

yuyichao% JULIA_LLVM_ARGS=-debug-only=loop-vectorize julia-debug 
Julia: Unknown command line argument '-debug-only=loop-vectorize'.  Try: 'Julia -help'
Julia: Did you mean '-debug-pass=loop-vectorize'?

Is there any other compile options I need to set for this?

@ArchRobison
Copy link
Contributor

I'm likely not going to be able to look at it further until next week, owing to a C++ committee deadline on Friday for proposals. So I encourage you to look into it.

Counter-intuitively, to get the "-debug-only" functionality, LLVM has to be built with assertions enabled. Add LLVM_ASSERTIONS = 1 to your Make.user, rebuild, and then JULIA_LLVM_ARGS=-debug-only=loop-vectorize should get you the extra output from usr/bin/julia-debug.

@yuyichao
Copy link
Contributor

Thanks.

The issue does seem to be the register width since UInt8 can be successfully vectorized (using avx2 instructions ..... = = .....).

However, it doesn't seems to be just this. With -debug-only=subtarget it clearly shows that this CPU has AVX2

Features:+64bit,+sse2
CPU:broadwell

Subtarget features: SSELevel 9, 3DNowLevel 0, 64bit 1

I'll try to poke around but I'm not sure if I can find the issue.

@ArchRobison
Copy link
Contributor

I'm back on this. Here's what I suspect is the proximate cause in codegen.cpp

#ifndef LLVM37
    jl_TargetMachine->addAnalysisPasses(*FPM);
#endif

Evidently addAnalysisPasses disappeared to make way for the latest fashion. After studying julia/deps/srccache/llvm-3.7.0/lib/CodeGen/LLVMTargetMachine.cpp, it looks like we need to call createTargetTransformInfoWrapperPass and possibly more, though I'm not sure yet.

ArchRobison pushed a commit to ArchRobison/julia that referenced this issue Sep 28, 2015
vtjnash added a commit that referenced this issue Sep 29, 2015
Fix issue #13106 by adding TargetTransformInfoWrapperPass to pass list.
skumagai pushed a commit to skumagai/julia that referenced this issue Oct 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants