-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to upstream changes wrt. native support for BFloat16 #51
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #51 +/- ##
===========================================
- Coverage 65.41% 22.22% -43.20%
===========================================
Files 3 3
Lines 133 171 +38
===========================================
- Hits 87 38 -49
- Misses 46 133 +87
☔ View full report in Codecov by Sentry. |
Interestingly, even though target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-linux-none"
define bfloat @julia_BFloat16_2304() {
top:
ret bfloat 0xR0000
}
This is fixed on LLVM 17, but aarch64 still lacks arithmetic-level support there: ; ModuleID = 'f'
source_filename = "f"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-none-eabi"
define float @julia_f_572(bfloat %"x::BFloat16") {
top:
%0 = fpext bfloat %"x::BFloat16" to float
ret float %0
}
On x86, both these work on LLVM 15+ which is the lower bound for this feature (as we only added cc @vchuravy |
afb5922
to
35c4799
Compare
Well this is weird. I cannot reproduce the CI failure on any system of mine. I thought it was ABI related, but it looks like LLVM somehow materializes a wrong constant here.
|
07a1dec
to
d2f8bbd
Compare
Alright, found something that reproduces locally: julia> f() = Core.Intrinsics.fptrunc(Core.BFloat16, 1f0)
f (generic function with 1 method)
julia> f()
Core.BFloat16(0x3c00)
julia> fptrunc(x) = Core.Intrinsics.fptrunc(Core.BFloat16, x)
fptrunc (generic function with 1 method)
julia> h() = fptrunc(1f0)
h (generic function with 1 method)
julia> h()
Core.BFloat16(0x3f80) |
75a6d23
to
ca97442
Compare
This PR adapts BFloat16s.jl to JuliaLang/julia#51470, where I'm adding native support for BFloat16s to Julia (using the
bfloat
type in LLVM). I decided to keep as much functionality as possible in this package, so Base only definesCore.BFloat16
and the necessary codegen support.The main benefit of this change is that we now emit drastically simpler IR, and rely on LLVM to lower it to something that the hardware supports. For example:
Before this PR:
Using this PR, on JuliaLang/julia#51470:
So the LLVM IR is much simpler, while the native code is (as expected) similar in complexity.
Performance is hard to compare for such simple operations, but representing BFloat16s natively should make it possible for LLVM to optimize them, and also select better instructions when possible. For example, with a CPU supporting AVX512BF16 and LLVM 17, we compile:
to:
So this will make it possible to use BFloat16s.jl with our vectorization packages (by using
NTuple{16,Core.VecElement{BFloat}}
, which now lowers to<16 x bfloat>
).This PR also switches the
significand
implementation, as it contained undefined behavior (forone(BFloat16)
,isig
isInt16(0)
). The new implementation is copied from Base.Closes #51