Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Creating fmaddsub for these intrinsics #89

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions src/LLVM_intrinsics.jl
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,7 @@ end
const MULADD_INTRINSICS = [
:fmuladd,
:fma,

]

for f in MULADD_INTRINSICS
Expand All @@ -431,6 +432,23 @@ for f in MULADD_INTRINSICS
end


for (t, N, T) in [("d" , 2, Float64), ("s" , 4, Float32),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These intrinsics only exist on x86 architectures. This function should be wrapped in an if statement that checks the CPU architecture, and possible also the Julia and LLVM version, to see whether the intrinsic is supported.

In case it is not, there should be a reasonably efficient generic fallback. The idea is that one can call fmaddsub all the time and expect a reasonable implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I genuinely cannot find when these were introduced into LLVM (I've looked at the release notes for pretty much every release since 3.0), however, they start robustly integrating AVX512 into the language at 4.0, so I'm just going to peg it at that and say FMA/AVX instructions are integrated there. I don't really know where to start with Julia versions since different installations of the same Julia version might use different LLVM versions, so I suspect that might be tricky. I'll probably steal your implementation below for the fallback, I don't really see any significant improvements.

("d.256", 4, Float64), ("s.256", 8, Float32),
# ("d.512", 8, Float64), ("s.512", 16, Float32) # These don't seem supported by LLVM yet
]
@eval @generated function fmaddsub(a::LVec{$N, $T}, b::LVec{$N, $T}, c::LVec{$N, $T})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmaddsub has a simpler cousin, faddsub. Do you want to add support for it as well?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function vfmaddsub is sufficiently obscure that it needs documentation. Documentation in the sense of equivalent Julia code would be good; maybe something along the lines of

fmaddsub(a,b,c) = SIMD((isodd(n) ? a[n*b[n]-c[n] : a[n]*b[n]+c[n] for n in 1:N)...)

Feel free to improve or modify.

Copy link
Author

@dannys4 dannys4 Jul 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re faddsub: I'd be happy to add it, though I have no need for it. That being said, I don't see any documentation or commits for it anywhere, and it's definitely not in the llvm.fma namespace (at least, when I try to do llvm.fma.faddsub or llvm.fma.vaddsub, it doesn't recognize the call, which makes sense as it's not an fma call), so I'm not sure about whether it's in the scope of this PR.

EDIT: I just saw your comment regarding llvm.sse3.addsub, so I now see that, but my question still remains on whether it's in the scope of this PR

re documentation: sounds great.

ff = "llvm.x86.fma.vfmaddsub.p"*$t
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the letter p part of the suffix t.

return :(
$(Expr(:meta, :inline));
ccall($ff, llvmcall, LVec{$($N), $($T)}, (LVec{$($N), $($T)}, LVec{$($N), $($T)}, LVec{$($N), $($T)}), a, b, c)
)
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end we will also need test cases.

end

# function fmaddsub(a::LVec{4, Float64}, b::LVec{4, Float64}, c::LVec{4, Float64}) where N
# ccall("llvm.x86.fma.vfmaddsub.pd.256", llvmcall, LVec{4, Float64}, (LVec{4, Float64}, LVec{4, Float64}, LVec{4, Float64}), a, b, c)
# end

################
# Load / store #
################
Expand Down