[wip] Creating `fmaddsub` for these intrinsics #89

dannys4 · 2021-07-22T13:37:19Z

Ideally, adding support for fmaddsub intrinsics. This is still a work in progress. In connection with #88

codecov-commenter · 2021-07-22T16:00:51Z

Codecov Report

Merging #89 (5c2560b) into master (ace8eb1) will decrease coverage by 0.53%.
The diff coverage is 0.00%.

❗ Current head 5c2560b differs from pull request most recent head 1e6b689. Consider uploading reports for the commit 1e6b689 to get more accurate results

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
- Coverage   90.23%   89.70%   -0.54%     
==========================================
  Files           3        3              
  Lines         502      505       +3     
==========================================
  Hits          453      453              
- Misses         49       52       +3

Impacted Files	Coverage Δ
src/LLVM_intrinsics.jl	`96.42% <0.00%> (-1.50%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ace8eb1...1e6b689. Read the comment docs.

eschnett · 2021-07-25T00:23:30Z

src/LLVM_intrinsics.jl

+                  # ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet
+                 ]
+    @eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T})
+        ff = "llvm.x86.fma.vfmaddsub.p"*$t


I would make the letter p part of the suffix t.

eschnett · 2021-07-25T00:25:36Z

src/LLVM_intrinsics.jl

-function fmaddsub(a::LVec{N, T}, b::LVec{N, T}, c::LVec{N, T}) where {N, T<:FloatingTypes}
-    Base.llvmcall("llvm.x86.fma.fmaddsub_pd", LVec{N, T}, (LVec{N, T}, LVec{N, T}, LVec{N, T}), a, b, c)
+
+for (t, N, T) in [("d"    , 2, Float64), ("s"    , 4, Float32),


These intrinsics only exist on x86 architectures. This function should be wrapped in an if statement that checks the CPU architecture, and possible also the Julia and LLVM version, to see whether the intrinsic is supported.

In case it is not, there should be a reasonably efficient generic fallback. The idea is that one can call fmaddsub all the time and expect a reasonable implementation.

I genuinely cannot find when these were introduced into LLVM (I've looked at the release notes for pretty much every release since 3.0), however, they start robustly integrating AVX512 into the language at 4.0, so I'm just going to peg it at that and say FMA/AVX instructions are integrated there. I don't really know where to start with Julia versions since different installations of the same Julia version might use different LLVM versions, so I suspect that might be tricky. I'll probably steal your implementation below for the fallback, I don't really see any significant improvements.

eschnett · 2021-07-25T00:26:06Z

src/LLVM_intrinsics.jl

+                  ("d.256", 4, Float64), ("s.256", 8, Float32),
+                  # ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet
+                 ]
+    @eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T})


fmaddsub has a simpler cousin, faddsub. Do you want to add support for it as well?

The function vfmaddsub is sufficiently obscure that it needs documentation. Documentation in the sense of equivalent Julia code would be good; maybe something along the lines of

fmaddsub(a,b,c) = SIMD((isodd(n) ? a[n*b[n]-c[n] : a[n]*b[n]+c[n] for n in 1:N)...)

Feel free to improve or modify.

re faddsub: I'd be happy to add it, though I have no need for it. That being said, I don't see any documentation or commits for it anywhere, and it's definitely not in the llvm.fma namespace (at least, when I try to do llvm.fma.faddsub or llvm.fma.vaddsub, it doesn't recognize the call, which makes sense as it's not an fma call), so I'm not sure about whether it's in the scope of this PR.

EDIT: I just saw your comment regarding llvm.sse3.addsub, so I now see that, but my question still remains on whether it's in the scope of this PR

re documentation: sounds great.

eschnett · 2021-07-25T00:26:50Z

src/LLVM_intrinsics.jl

+            $(Expr(:meta, :inline));
+            ccall($ff, llvmcall, LVec{$($N), $($T)}, (LVec{$($N), $($T)}, LVec{$($N), $($T)}, LVec{$($N), $($T)}), a, b, c)
+        )
+    end


In the end we will also need test cases.

KristofferC · 2021-07-26T08:01:09Z

These are very different from all other intrinsics we support (which are based on https://llvm.org/docs/LangRef.html). These are generic in terms of the size of the Vector and LLVM knows how to translate these to different targets.

Regarding target-specific intrinsic, there are a huge number of them (https://software.intel.com/sites/landingpage/IntrinsicsGuide/) and they are only available on certain architectures. It isn't clear to me how exactly this should be exposed by SIMD.jl. For example, what happens if someone calls this new function on non-x86? Or when the size of the vector is not exactly such that there is an intrinsic for it?

I believe this requires quite a lot of careful planning and thought before just implementing it in exactly the same way as the previous intrinsics. For example, it should probably live in its own file and have some consistent naming based on how it would be called in e.g. C.

dannys4 · 2021-07-26T13:29:45Z

Yes, I agree. I'll repeat what I said in the comment on the issue, but this seems to be stepping into territory covered by VectorizationBase.jl (as that package already detects compatibilities with your system). I'm aware that there's a huge number of other intrinsics, but it's unclear which ones are in LLVM and which ones aren't (and where they might be).

For example, what happens if someone calls this new function on non-x86? Or when the size of the vector is not exactly such that there is an intrinsic for it?

I'm not exactly sure the answer to the first question, but I would hope that multiple dispatch would solve the second issue. I know @eschnett was saying only to define these functions for x86 and (theoretically), similar functions could be defined for NEON intrinsics. Then, there could be default implementations. But this brings back the question of scope. I'm happy to work on this and build out more functionality, but (as I said previously) I'm not sure what the scope is for this. If you decide that this doesn't belong here, that sounds fine by me, I just would recommend putting a disclaimer in the description saying that you're only supporting the intrinsics in the Language Reference (and not necessarily all possible intrinsics for all possible architectures).

Either way, I think it's smart for me to hold off until someone tells me what the larger plan is.

eschnett · 2021-07-26T13:48:46Z

It would be good to define a function mulsubadd, which is generically useful to handle complex numbers. This function can then have efficient specialized implementations for various architectures (if possible), but it would exist and would be efficient on all architectures. The case is similar to muladd, which exists everywhere, independent of whether the CPU offers a specialized implementation.

Initial commits

5c2560b

eschnett mentioned this pull request Jul 23, 2021

Requesting Support for fmaddsub #88

Open

dannys4 added 4 commits July 23, 2021 19:44

Add fmaddsub to MULADD_INTRINSICS

d8180c9

Changing how to call it

0718fd0

Generating and interpolating fmaddsub

e27c75d

Adding vfmasubadd

1e6b689

eschnett requested changes Jul 25, 2021

View reviewed changes

KristofferC mentioned this pull request Aug 5, 2021

carry intrinsics? #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Creating `fmaddsub` for these intrinsics #89

[wip] Creating `fmaddsub` for these intrinsics #89

dannys4 commented Jul 22, 2021 •

edited

Loading

codecov-commenter commented Jul 22, 2021 •

edited

Loading

eschnett Jul 25, 2021

eschnett Jul 25, 2021

dannys4 Jul 25, 2021

eschnett Jul 25, 2021

eschnett Jul 25, 2021

dannys4 Jul 25, 2021 •

edited

Loading

eschnett Jul 25, 2021

KristofferC commented Jul 26, 2021 •

edited

Loading

dannys4 commented Jul 26, 2021 •

edited

Loading

eschnett commented Jul 26, 2021

[wip] Creating fmaddsub for these intrinsics #89

Are you sure you want to change the base?

[wip] Creating fmaddsub for these intrinsics #89

Conversation

dannys4 commented Jul 22, 2021 • edited Loading

codecov-commenter commented Jul 22, 2021 • edited Loading

Codecov Report

eschnett Jul 25, 2021

Choose a reason for hiding this comment

eschnett Jul 25, 2021

Choose a reason for hiding this comment

dannys4 Jul 25, 2021

Choose a reason for hiding this comment

eschnett Jul 25, 2021

Choose a reason for hiding this comment

eschnett Jul 25, 2021

Choose a reason for hiding this comment

dannys4 Jul 25, 2021 • edited Loading

Choose a reason for hiding this comment

eschnett Jul 25, 2021

Choose a reason for hiding this comment

KristofferC commented Jul 26, 2021 • edited Loading

dannys4 commented Jul 26, 2021 • edited Loading

eschnett commented Jul 26, 2021

[wip] Creating `fmaddsub` for these intrinsics #89

[wip] Creating `fmaddsub` for these intrinsics #89

dannys4 commented Jul 22, 2021 •

edited

Loading

codecov-commenter commented Jul 22, 2021 •

edited

Loading

dannys4 Jul 25, 2021 •

edited

Loading

KristofferC commented Jul 26, 2021 •

edited

Loading

dannys4 commented Jul 26, 2021 •

edited

Loading