-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: BFloat16 #96295
Comments
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsBackground and motivationThe bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is useful for machine learning to improve memory utilization, and can be used to accelerate AI workloads via AVC-512 BP / and ARMv8.6-A instructions. Adding this type would allow us to implement these new instructions sets, and provide a common base type for various machine learning libraries. API Proposalnamespace System
{
public readonly struct BFloat16 : IComparable, IFormattable, IComparable<BFloat16>, IEquatable<BFloat16>, IConvertible, ISpanFormattable, IUtf8SpanFormattable
{
public static readonly BFloat16 MinValue;
public static readonly BFloat16 MaxValue;
public static bool IsNegative(BFloat16 h);
public static BFloat16 Parse(string s);
public static BFloat16 Parse(string s, NumberStyles style);
public static BFloat16 Parse(string s, NumberStyles style, IFormatProvider provider);
public static BFloat16 Parse(string s, IFormatProvider provider);
public static BFloat16 Parse(ReadOnlySpan<char> s);
public static BFloat16 Parse(ReadOnlySpan<char> s, NumberStyles style);
public static BFloat16 Parse(ReadOnlySpan<char> s, IFormatProvider provider);
public static BFloat16 Parse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider provider);
public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider provider);
public static bool TryParse(string s, out BFloat16 result);
public static bool TryParse(string s, NumberStyles style, IFormatProvider provider, out BFloat16 result);
public static bool TryParse(ReadOnlySpan<char> s, out BFloat16 result);
public static bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider provider, out BFloat16 result);
public int CompareTo(object value);
public int CompareTo(BFloat16 value);
public bool Equals(BFloat16 obj);
public override bool Equals(object obj);
public override int GetHashCode();
public TypeCode GetTypeCode();
public string ToString(IFormatProvider provider);
public string ToString(string format);
public string ToString(string format, IFormatProvider provider);
public override string ToString();
public static explicit operator BFloat16(float value);
public static explicit operator float(BFloat16 value);
public static bool operator ==(BFloat16 left, BFloat16 right);
public static bool operator !=(BFloat16 left, BFloat16 right);
public static bool operator <(BFloat16 left, BFloat16 right);
public static bool operator >(BFloat16 left, BFloat16 right);
public static bool operator <=(BFloat16 left, BFloat16 right);
public static bool operator >=(BFloat16 left, BFloat16 right);
}
} API UsageBFloat16 bf16 = 1.0f; Alternative DesignsNo response RisksNo response
|
This should probably expose the whole API surface that Half has, including all the operators like addition and such even if they're not accelerated by most hardware. |
@MichalPetryka Updated to implement the IFloatingPoint interface, along with its operators. These can likely also forward to MathF / float, like Half by default. |
You've missed |
The proposal has been updated to include the IMinMaxValue interface. Note: the API is limited to public members. There are various INumber and IFloatingPoint members that are not listed, but will need explicit implementations to participate in the generic math system. @MichalPetryka Let me know if you spot any other missing public members. |
|
Fixed. |
I don't think mathematic functions should be implemented. They are likely not supported by hardware, nor required by any specification. I'd expect it to implement only conversion operators, and basic arithmetic operators only:
|
I believe there's still value implementing the Trigonometric & Hyperbolic functions as this type maintains the full Float32 range. Converting a BFloat16 to a Single can also be done in a few shift operations. This operation is much slower on the Half type. public unsafe static float BFloat16ToSingle(ushort bfloat16)
{
int f32Value =
(bfloat16 & 0x8000) << 16 | // sign bit
((bfloat16 & 0x7FFF) + 0x1C000) << 13; // exponent and mantissa
return *(float*)&f32Value;
} ARM also provides the accelerated BFCVT function to convert a Single back to a Float16. However, I agree they are non-essential. |
I think it's worth noting that proposed API surface isn't necessarily the one that's initially implemented as it was noted in #81376.
|
This seems like needlessly complicated to read, and generates worse codegen than is needed. A bfloat16 is just a truncated binary32: public static float BFloat16ToBinary32(ushort value)
{
uint temp = (uint)value << 16;
return Unsafe.As<uint, float>(ref temp);
} |
This isn't important to API review. The potential for operators to be added later is generally not a major consideration in the exposure of a type. We almost never know the "full" surface area, and while it might be relevant to consider whether additional APIs are planned, they really only limit the ability to cleanly implement/expose the initial surface. This type is not really a core/common type and isn't even strictly "well spec'd" in the same way the IEEE 754 types are. It likely should exist in the It should initially only cover itself as a minimal interchange type with the relevant conversion APIs. That is going to be the 99% use case and is the only case that will be hardware accelerated for the near future. I'm fine with separately considering the expansion of this to support the full set of
Notably this is not universally true. It was initially introduced using truncation, but there are a number of different hardware implementations nowadays and some use ties to even (IEEE 754 default, which Google TPU uses) or round to odd (ARM), etc. We should likely default to truncation, but its possible we need additional APIs to support other rounding modes. |
@tannergooding Thanks for the comments! I update the proposal to use the System.Numerics namespace and scaled back the surface area to be used as a minimal interchange type. |
These should notably be properties since its a trivial constant over a value type and can avoid the static initializer:
We also need the conversion from
|
Does it make sense to require explicit upcasting to float and double as all bfloat16s are perfectly representable as binary32 and binary64? |
Implicit casts can introduce potential versioning concerns and so it depends a bit. It will likely be a discussion point in the API review. |
Looks good as proposed. Also with whatever level of generic math (and public visibility thereof) is appropriate. ( namespace System.Numerics
{
public readonly struct BFloat16
: IComparable,
IComparable<BFloat16>,
IEquatable<BFloat16>
{
public static BFloat16 Epsilon { get; }
public static BFloat16 MinValue { get; }
public static BFloat16 MaxValue { get; }
// Casting
public static explicit operator BFloat16(float value);
public static explicit operator BFloat16(double value);
public static explicit operator float(BFloat16 value);
public static explicit operator double(BFloat16 value);
// Comparison
public int CompareTo(object value);
public int CompareTo(BFloat16 value);
public static bool operator ==(BFloat16 left, BFloat16 right);
public static bool operator !=(BFloat16 left, BFloat16 right);
public static bool operator <(BFloat16 left, BFloat16 right);
public static bool operator >(BFloat16 left, BFloat16 right);
public static bool operator <=(BFloat16 left, BFloat16 right);
public static bool operator >=(BFloat16 left, BFloat16 right);
// Equality
public bool Equals(BFloat16 obj);
public override bool Equals(object? obj);
public override int GetHashCode();
// ToString override
public override string ToString();
}
} |
Which assembly should it belong to? Should it be in S.R.Numerics like Complex? Since there are hardware acceleration for it, it should likely be in CoreLib. |
Shouldn't it be called BHalf, since there's Half, Single & Double as opposed to Float16, Float32 and Float64? |
// Casting
public static explicit operator BFloat16(float value);
public static explicit operator BFloat16(double value);
public static explicit operator float(BFloat16 value);
public static explicit operator double(BFloat16 value); Correct me if I'm wrong, but isn't it the case that every |
Also, for those conversion that are not lossless, shouldn't there be checked and unchecked versions? |
What are those versioning concerns? It's a little unfortunate to have those be explicit not only because you have to add a cast, but because the conversion being explicit makes me (and I assume others as well) think that it cannot be safely converted, when in fact it can be. It's really counterintuitive for them to be explicit. |
No, the industry standard names for the types are
Checked vs unchecked normally only exist where a conversion can throw. Floating-point conversions never throw and have 1 strictly defined behavior, which is round to nearest representable. You theoretically could expose the optional IEEE 754 support for raising an "inexact exception", but that throws for almost every operation you can imagine, even
Language primitive types get special handling and precedence for conversions. There are many cases where this can negatively impact overload resolution either by new ambiguities caused by new implicit conversions or by the wrong overload being silently selected. A simple example is if you have Similar issues exist when introducing new APIs around |
Wait, uh? 😟 I assumed until now that in a checked context, if I cast a numeric type and the value can't fit into the new type, it throws. Now I could have bugs in my code I guess :/ But thanks for letting me know.
This seems like an argument against all implicit conversions altogether. But the language has them and people are used to them. So it seems weird that some numeric types would have them and others would not, for a reason that applies to all of them. If they were really so bad, why would they exist in the language? For one reason or another, they made the call about them existing and about numeric types having them. So I feel like we should follow that to be consistent. I get the argument about being explicit about things, but it's still weird for them to be explicit as it makes me think wait, this is dangerous and I have to have extra scrutiny here as there can be either an exception or a loss of precision due to an explicit cast. When in fact there can't be and it's completely safe. I wish there was a special syntax for conversions that made you be explicit about them, just like explicit conversions, but would only allow conversions that are "implicit"/safe. But there isn't :( For better or worse, we have what we have in the language, but people (including me) have gotten used to what we have so I still feel like there should be consistency instead of banishing certain language features that we don't like for new code, even though they're used all over the place in existing code and will always be as they'll always be implicit conversions for the builtin types and other existing types that have them, and they'll always be this weird inconsistency that makes people stop and wonder why it's there. I just associate explicit conversions with conversions that aren't safe, because if they were safe, they would be implicit - that's the way it has always been (apart from that one mistake of |
If this is really the decision for all conversions to be explicit going forward regardless of whether they're safe or not, please, at least add doc comments and documentation pages for those conversions saying whether they are actually safe or not. |
Checked has always really pertained to overflow/underflow and not necessarily towards "representable". The simplest example is that Likewise floating-point to integer conversions do throw for
In some ways, yes. There are many languages that explicitly do not provide implicit conversions because of these issues.
Yes, and so our decision on whether to use implicit conversions or not is based around the likelihood people will run into issues/pits of failure. There are many cases where implicit conversions are good and where we would expose them for new types; this just doesn't happen to be one of them due to it being a more esoteric user-defined type that needs to interplay with multiple built-in types (which have special conversion precedence rules) and being used in scenarios where a new overload causing a silent loss of precision could be both easily missed and have a large negative impact were it to make it production. That is to say, we don't only make the decision to expose implicit conversions based on whether or not something is lossless. We have to also account for how that is likely to be used or impact other existing overloads, especially for more common types, and how likely it is to be exposed as an overload for those other types. This case has both of those as fairly likely, especially in domains where the combination of perf and precision are often competing with eachother. We can always expose the implicit conversions later given enough feedback, but we can't take them away once they are exposed. So defaulting to explicit here is the better/safer option and won't be overly negative, particularly given the primary domains are going to involve using vectors and require explicit conversions anyways. |
Right, but I would consider converting an
I guess I wouldn't consider that to be an overflow, I wouldn't expect that to throw as that's what integer division is defined as. But I would consider casting a But thanks for letting me know about the semantics (or lack of thereof) of EDIT: |
Of sorts, but that's the intent of It really just falls out that there is no value that can overflow, because its always representable as infinity, which is unlike integers which can only represent finite values. |
Background and motivation
The bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is useful for machine learning to improve memory utilization, and can be used to accelerate AI workloads via AVC-512 BP / and ARMv8.6-A instructions.
Adding this type would allow us to implement these new instructions sets, and provide a common base type for various machine learning libraries.
API Proposal
API Usage
Alternative Designs
No response
Risks
No response
The text was updated successfully, but these errors were encountered: