-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLP vectorization not working for tuples #11899
Comments
3.6.0 or 3.6.1? (not sure it would make any difference) |
@ScottPJones 3.6.1 I'm currently building with |
OK, that's 3.7, not 3.6 (I've also been building with both 3.6.1 (release) and 3.7 (svn)) |
I've been having problems with LLVN-svn, today, and even the release 0.3.10 😞 |
Sorry - when I said "currently", I meant "at this very moment I'm rebuilding with 3.7". The version I was running on a couple of minutes ago when I filed this issue was 3.6.1 |
Ah, OK, my misunderstanding! Brain tired after having 4 days of drinking from the proverbial fire hose! |
I can replicate the problem. I'll look into it next week. |
I see the problem. Julia 0.3 represents homogeneous tuples as LLVM vectors; Julia 0.4 represents them as LLVM arrays. Doing so torpedoes SLP vectorizer since LLVM has no vector instructions for arrays. What's the motivation for the change? Short homogeneous tuples seem like a nice abstraction for hardware vectors. (My apologies for missing whatever conversation led to the change.) |
cc @vtjnash, @Keno, @JeffBezanson |
Looks like #11187 was the motivation. In the commit I see:
And the code comments say:
Is there any way I can help with the load/store alignment issues? We should also consider refining |
We probably want to wait for Jameson's codegen rewrite to do that, so that we can make choices on storage on a per-case basis instead of having to decide only based on the type itself. The vector alignment was part of this I think, since if we declare that all homogeneous tuples of size < K have to be llvm vector types, then they also have to be that when inside a struct, which would prevent us from having C compatible layout because of alignment. |
cc @vtjnash to check. But I think with your change we could have tuples be llvm vectors only when we got them out of an array or are passing them as specsig function arguments (and homogeneous, and of a reasonable size, given the buggy handling of weird sized vectors by llvm) |
it was turned off prior to then. i did a bit of work in #11187 trying to make it possible to turn back on, but didn't manage to get it fully working.
if you try turning it on, see which tests fail / segfault due to incorrect alignment assumptions
yes, please! it's almost done.
i would love to hear ideas for handling this sanely. currently the behavior of tuples in llvmcall / ccall is officially undefined (even though we have a test for it), since it isn't entirely clear when tuples/structs should turn into vector types, and when they should be array types. maybe we should special-case a MMX type in codegen, like we have for floats, signed, and complex? additionally, since the new AVX stuff requires > 16 byte alignment, it will be a bit complicated to allocate them since both system malloc and Julia's GC only guarantee 16 byte alignment. as a workaround, it may be best to just force LLVM to use the unaligned vmov instructions. |
I'm looking forward to the codegen rewrite. I concur with using unaligned vector move instructions for the >16 byte alignment cases unless the compiler can prove the vector is aligned. |
So this is blocked by #11973? (And hence won't be fixed before 0.4) |
It would seem so, unless we want to hack an ad-hoc patch for the current codegen, which is probably not worth the effort. |
I've worked out an ABI and patch. All the tests pass. I'll wait for #11973 to be committed so I can restructure it for that. The ABI that I came up with uses an LLVM vector for a tuple under the following conditions:
Rule 4 ensures layouts have the same offsets as before. Rules 4 and 5 sometimes require bitwise conversion between arrays and tuples, though LLVM seems good about removing gratuitous ones. The net effect of these rules is that LLVM vector types are used when vector-like tuples act as return values, arguments to specialized signatures, local variables, or SSA values. |
Good news/bad news:
Changes are in my repository under branch adr/simdtuple. There seem to be three ways to take this forward:
I'm leaning in the direction of 1 since it's the most general solution. E.g., it would enable SIMDization of an immutable type with "red", "green", "blue", and "alpha" fields of type I'll mull this some more. |
The problem will be codes that use homogeneous tuples in way that is not amenable to SIMD instructions. E.g., elementwise addition of the first three elements, and a subtraction for the fourth. My change as it currently is would slow down those kinds of codes. |
But, 1 & 2 are not mutually exclusive, are they? Both sound like they should be done at some point. |
Update: I submitted an LLVM patch yesterday that makes SLPVectorization work with tuples again. Once it passes code review, it shouldn't be difficult to backport as a patch against LLVM 3.7. The patch not only enables SLP vectorization of tuples; it also enables SLP vectorization of immutable types. This gist shows two examples, one with tuple and one with immutable, that both vectorize with the patch. |
Nice! What sort of results did you get before/after from the examples in your gist? |
This looks awesome! Looking forward to it. |
This vectorizes when using
|
After reading #6271, I wanted to reproduce the vectorization that @ArchRobison displayed in his tuple example. Unfortunately, this doesn't seem to work any longer (just a copy-paste of the example given in the aforementioned PR):
Caveat: I built v0.4 using LLVM 3.6, which wasn't tested in #6271.
The text was updated successfully, but these errors were encountered: