Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: BFloat16 #96295

Open
iamcarbon opened this issue Dec 24, 2023 · 28 comments · May be fixed by #98643
Open

[API Proposal]: BFloat16 #96295

iamcarbon opened this issue Dec 24, 2023 · 28 comments · May be fixed by #98643
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@iamcarbon
Copy link

iamcarbon commented Dec 24, 2023

Background and motivation

The bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is useful for machine learning to improve memory utilization, and can be used to accelerate AI workloads via AVC-512 BP / and ARMv8.6-A instructions.

Adding this type would allow us to implement these new instructions sets, and provide a common base type for various machine learning libraries.

API Proposal

namespace System.Numerics
{
    public readonly struct BFloat16
      : IComparable,
        IComparable<BFloat16>,
        IEquatable<BFloat16>
    {
        public static BFloat16 Epsilon { get; }
        public static BFloat16 MinValue { get; }
        public static BFloat16 MaxValue { get; }
                
        // Casting
        public static explicit operator BFloat16(float value);
        public static explicit operator BFloat16(double value);
        public static explicit operator float(BFloat16 value);
        public static explicit operator double(BFloat16 value);

        // Comparison
        public int CompareTo(object value);
        public int CompareTo(BFloat16 value);
        public static bool operator ==(BFloat16 left, BFloat16 right);
        public static bool operator !=(BFloat16 left, BFloat16 right);
        public static bool operator <(BFloat16 left, BFloat16 right);
        public static bool operator >(BFloat16 left, BFloat16 right);
        public static bool operator <=(BFloat16 left, BFloat16 right);
        public static bool operator >=(BFloat16 left, BFloat16 right);

        // Equality
        public bool Equals(BFloat16 obj);
        public override bool Equals(object? obj);
        public override int GetHashCode();
        
        // ToString override
        public override string ToString();
    }
}

API Usage

BFloat16 bf16 = (BFloat16)1.0f; 

Alternative Designs

No response

Risks

No response

@iamcarbon iamcarbon added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 24, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 24, 2023
@ghost
Copy link

ghost commented Dec 24, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is useful for machine learning to improve memory utilization, and can be used to accelerate AI workloads via AVC-512 BP / and ARMv8.6-A instructions.

Adding this type would allow us to implement these new instructions sets, and provide a common base type for various machine learning libraries.

API Proposal

namespace System
{
    public readonly struct BFloat16 : IComparable, IFormattable, IComparable<BFloat16>, IEquatable<BFloat16>, IConvertible, ISpanFormattable, IUtf8SpanFormattable
    {
        public static readonly BFloat16 MinValue;
        public static readonly BFloat16 MaxValue;
                
        public static bool IsNegative(BFloat16 h);
        public static BFloat16 Parse(string s);
        public static BFloat16 Parse(string s, NumberStyles style);
        public static BFloat16 Parse(string s, NumberStyles style, IFormatProvider provider);
        public static BFloat16 Parse(string s, IFormatProvider provider);
        public static BFloat16 Parse(ReadOnlySpan<char> s);
        public static BFloat16 Parse(ReadOnlySpan<char> s, NumberStyles style);
        public static BFloat16 Parse(ReadOnlySpan<char> s, IFormatProvider provider);
        public static BFloat16 Parse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider provider);
        public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider provider);
        public static bool TryParse(string s, out BFloat16 result);
        public static bool TryParse(string s, NumberStyles style, IFormatProvider provider, out BFloat16 result);
        public static bool TryParse(ReadOnlySpan<char> s, out BFloat16 result);
        public static bool TryParse(ReadOnlySpan<char> s, NumberStyles style, IFormatProvider provider, out BFloat16 result);
        public int CompareTo(object value);
        public int CompareTo(BFloat16 value);
        public bool Equals(BFloat16 obj);
        public override bool Equals(object obj);
        public override int GetHashCode();
        public TypeCode GetTypeCode();
        public string ToString(IFormatProvider provider);
        public string ToString(string format);
        public string ToString(string format, IFormatProvider provider);
        public override string ToString();
        public static explicit operator BFloat16(float value);
        public static explicit operator float(BFloat16 value);
        public static bool operator ==(BFloat16 left, BFloat16 right);
        public static bool operator !=(BFloat16 left, BFloat16 right);
        public static bool operator <(BFloat16 left, BFloat16 right);
        public static bool operator >(BFloat16 left, BFloat16 right);
        public static bool operator <=(BFloat16 left, BFloat16 right);
        public static bool operator >=(BFloat16 left, BFloat16 right);
    }
}

API Usage

BFloat16 bf16 = 1.0f; 

Alternative Designs

No response

Risks

No response

Author: iamcarbon
Assignees: -
Labels:

api-suggestion, area-System.Numerics, untriaged

Milestone: -

@MichalPetryka
Copy link
Contributor

This should probably expose the whole API surface that Half has, including all the operators like addition and such even if they're not accelerated by most hardware.

@iamcarbon
Copy link
Author

@MichalPetryka Updated to implement the IFloatingPoint interface, along with its operators. These can likely also forward to MathF / float, like Half by default.

@MichalPetryka
Copy link
Contributor

Updated to implement the IFloatingPoint interface, along with its operators

You've missed IMinMaxValue<BFloat16>, which Half has.

@iamcarbon
Copy link
Author

iamcarbon commented Dec 25, 2023

The proposal has been updated to include the IMinMaxValue interface. Note: the API is limited to public members. There are various INumber and IFloatingPoint members that are not listed, but will need explicit implementations to participate in the generic math system. @MichalPetryka Let me know if you spot any other missing public members.

@MichalPetryka
Copy link
Contributor

The proposal has been updated to include the IMinMaxValue interface. Note: the API is limited to public members. There are various INumber and IFloatingPoint members that are not listed, but will need explicit implementations to participate in the generic math system. @MichalPetryka Let me know if you spot any other missing public members.

Unary Negation Operators seems to have the unary plus.

@iamcarbon
Copy link
Author

Unary Negation Operators seems to have the unary plus.

Fixed.

@huoyaoyuan
Copy link
Member

This should probably expose the whole API surface that Half has, including all the operators like addition and such even if they're not accelerated by most hardware.

I don't think mathematic functions should be implemented. They are likely not supported by hardware, nor required by any specification.
The first version of Half in .NET 5 is only a transport type, with no IEEE754 function implemented.

I'd expect it to implement only conversion operators, and basic arithmetic operators only:

// comparable, equatable, parsing and formatting omitted
IMinMaxValue
IBinaryNumber
IFloatingPoint

@iamcarbon
Copy link
Author

iamcarbon commented Dec 25, 2023

I believe there's still value implementing the Trigonometric & Hyperbolic functions as this type maintains the full Float32 range.

Converting a BFloat16 to a Single can also be done in a few shift operations. This operation is much slower on the Half type.

public unsafe static float BFloat16ToSingle(ushort bfloat16)
{
    int f32Value  = 
        (bfloat16 & 0x8000) << 16 |                      // sign bit
        ((bfloat16 & 0x7FFF) + 0x1C000) << 13; // exponent and mantissa

    return *(float*)&f32Value;
}

ARM also provides the accelerated BFCVT function to convert a Single back to a Float16.

However, I agree they are non-essential.

@MichalPetryka
Copy link
Contributor

I don't think mathematic functions should be implemented. They are likely not supported by hardware, nor required by any specification.
The first version of Half in .NET 5 is only a transport type, with no IEEE754 function implemented.

I think it's worth noting that proposed API surface isn't necessarily the one that's initially implemented as it was noted in #81376.
As such, I think that unless the decision would be to never add the full set of operations (which seems unlikely since hardware is already starting to expose them), API review should see the final surface during review, even if its implementation would be partial initially.

Let me know if you spot any other missing public members.

Diffing with Half seems to still show some missing members.

@colejohnson66
Copy link

public unsafe static float BFloat16ToSingle(ushort bfloat16)
{
    int f32Value  = 
        (bfloat16 & 0x8000) << 16 |                      // sign bit
        ((bfloat16 & 0x7FFF) + 0x1C000) << 13; // exponent and mantissa

    return *(float*)&f32Value;
}

This seems like needlessly complicated to read, and generates worse codegen than is needed. A bfloat16 is just a truncated binary32:

public static float BFloat16ToBinary32(ushort value)
{
    uint temp = (uint)value << 16;
    return Unsafe.As<uint, float>(ref temp);
}

@tannergooding
Copy link
Member

API review should see the final surface during review

This isn't important to API review. The potential for operators to be added later is generally not a major consideration in the exposure of a type. We almost never know the "full" surface area, and while it might be relevant to consider whether additional APIs are planned, they really only limit the ability to cleanly implement/expose the initial surface.


This type is not really a core/common type and isn't even strictly "well spec'd" in the same way the IEEE 754 types are. It likely should exist in the System.Numerics namespace (much as the new Decimal32/64/128 types will be).

It should initially only cover itself as a minimal interchange type with the relevant conversion APIs. That is going to be the 99% use case and is the only case that will be hardware accelerated for the near future. I'm fine with separately considering the expansion of this to support the full set of IBinaryFloatingPointIeee754<T> members, but that should be split out and separate from the mainline consideration. Such members would only be convenience APIs for upcast to float, do the operation, downcast to bfloat after all and in many cases would be the less efficient way to operate on the data (typical usage in AI/ML/GPU is to upcast a vector's worth of these values, operate on them as float end to end, and then downcast when storing back to memory/disk).

A bfloat16 is just a truncated binary32

Notably this is not universally true. It was initially introduced using truncation, but there are a number of different hardware implementations nowadays and some use ties to even (IEEE 754 default, which Google TPU uses) or round to odd (ARM), etc.

We should likely default to truncation, but its possible we need additional APIs to support other rounding modes.

@iamcarbon
Copy link
Author

@tannergooding Thanks for the comments! I update the proposal to use the System.Numerics namespace and scaled back the surface area to be used as a minimal interchange type.

@tannergooding
Copy link
Member

These should notably be properties since its a trivial constant over a value type and can avoid the static initializer:

public static BFloat16 Epsilon { get; }
public static BFloat16 MinValue { get; }
public static BFloat16 MaxValue { get; }

We also need the conversion from double for parity

public static explicit operator BFloat16(double value);

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Jan 5, 2024
@colejohnson66
Copy link

Does it make sense to require explicit upcasting to float and double as all bfloat16s are perfectly representable as binary32 and binary64?

@tannergooding
Copy link
Member

Implicit casts can introduce potential versioning concerns and so it depends a bit. It will likely be a discussion point in the API review.

@bartonjs
Copy link
Member

bartonjs commented Feb 13, 2024

Video

Looks good as proposed. Also with whatever level of generic math (and public visibility thereof) is appropriate. (IFloatingPointIeee754<BFloat16>, most probably)

namespace System.Numerics
{
    public readonly struct BFloat16
      : IComparable,
        IComparable<BFloat16>,
        IEquatable<BFloat16>
    {
        public static BFloat16 Epsilon { get; }
        public static BFloat16 MinValue { get; }
        public static BFloat16 MaxValue { get; }
                
        // Casting
        public static explicit operator BFloat16(float value);
        public static explicit operator BFloat16(double value);
        public static explicit operator float(BFloat16 value);
        public static explicit operator double(BFloat16 value);

        // Comparison
        public int CompareTo(object value);
        public int CompareTo(BFloat16 value);
        public static bool operator ==(BFloat16 left, BFloat16 right);
        public static bool operator !=(BFloat16 left, BFloat16 right);
        public static bool operator <(BFloat16 left, BFloat16 right);
        public static bool operator >(BFloat16 left, BFloat16 right);
        public static bool operator <=(BFloat16 left, BFloat16 right);
        public static bool operator >=(BFloat16 left, BFloat16 right);

        // Equality
        public bool Equals(BFloat16 obj);
        public override bool Equals(object? obj);
        public override int GetHashCode();
        
        // ToString override
        public override string ToString();
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 13, 2024
@huoyaoyuan
Copy link
Member

huoyaoyuan commented Feb 18, 2024

Which assembly should it belong to? Should it be in S.R.Numerics like Complex?

Since there are hardware acceleration for it, it should likely be in CoreLib.

@huoyaoyuan huoyaoyuan linked a pull request Feb 19, 2024 that will close this issue
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 19, 2024
@Neme12
Copy link

Neme12 commented Mar 13, 2024

Shouldn't it be called BHalf, since there's Half, Single & Double as opposed to Float16, Float32 and Float64?

@Neme12
Copy link

Neme12 commented Mar 13, 2024

        // Casting
        public static explicit operator BFloat16(float value);
        public static explicit operator BFloat16(double value);
        public static explicit operator float(BFloat16 value);
        public static explicit operator double(BFloat16 value);

Correct me if I'm wrong, but isn't it the case that every BFloat16 can be losslessly converted to a float and double? If so, why aren't those conversions implicit? Also, aren't conversions to and from Half needed as well?

@Neme12
Copy link

Neme12 commented Mar 13, 2024

Also, for those conversion that are not lossless, shouldn't there be checked and unchecked versions?

@Neme12
Copy link

Neme12 commented Mar 13, 2024

Implicit casts can introduce potential versioning concerns and so it depends a bit. It will likely be a discussion point in the API review.

What are those versioning concerns? It's a little unfortunate to have those be explicit not only because you have to add a cast, but because the conversion being explicit makes me (and I assume others as well) think that it cannot be safely converted, when in fact it can be. It's really counterintuitive for them to be explicit.

@tannergooding
Copy link
Member

Shouldn't it be called BHalf, since there's Half, Single & Double as opposed to Float16, Float32 and Float64?

No, the industry standard names for the types are BFloat16, Half, Single, and Double. The "spec" names are brain float16, binary16, binary32, and binary64

Also, for those conversion that are not lossless, shouldn't there be checked and unchecked versions?

Checked vs unchecked normally only exist where a conversion can throw. Floating-point conversions never throw and have 1 strictly defined behavior, which is round to nearest representable.

You theoretically could expose the optional IEEE 754 support for raising an "inexact exception", but that throws for almost every operation you can imagine, even 1 / 10 or 0.1 + 0.2 results in an inexact result (even when accounting for the actual underlying values represented not being 0.1 and 0.2).

What are those versioning concerns?

Language primitive types get special handling and precedence for conversions. There are many cases where this can negatively impact overload resolution either by new ambiguities caused by new implicit conversions or by the wrong overload being silently selected.

A simple example is if you have double M(double x) and call double x = M(5) it will call the only overload. However, if you then expose float M(float x), the call will now call the overload that takes float and silently upcast the result back to double, so not only do you have a change in precision (which for large int is potentially lossy when cast to float), but it is a silent change in precision due to the upcast of the float result back to double.

Similar issues exist when introducing new APIs around Half or BFloat16 where they have implicit conversions to float (or other primitive types) and especially if they have any implicit conversions from other primitive types. For that reason, we explicitly made the casts on Half explicit and made a similar decision for float, as it avoids an entire class of issues and helps make the operation that much more explicit.

@Neme12
Copy link

Neme12 commented Mar 13, 2024

Checked vs unchecked normally only exist where a conversion can throw. Floating-point conversions never throw and have 1 strictly defined behavior, which is round to nearest representable.

Wait, uh? 😟 I assumed until now that in a checked context, if I cast a numeric type and the value can't fit into the new type, it throws. Now I could have bugs in my code I guess :/ But thanks for letting me know.

A simple example is if you have double M(double x) and call double x = M(5) it will call the only overload. However, if you then expose float M(float x), the call will now call the overload that takes float and silently upcast the result back to double, so not only do you have a change in precision (which for large int is potentially lossy when cast to float), but it is a silent change in precision due to the upcast of the float result back to double.

This seems like an argument against all implicit conversions altogether. But the language has them and people are used to them. So it seems weird that some numeric types would have them and others would not, for a reason that applies to all of them.

If they were really so bad, why would they exist in the language? For one reason or another, they made the call about them existing and about numeric types having them. So I feel like we should follow that to be consistent. I get the argument about being explicit about things, but it's still weird for them to be explicit as it makes me think wait, this is dangerous and I have to have extra scrutiny here as there can be either an exception or a loss of precision due to an explicit cast. When in fact there can't be and it's completely safe. I wish there was a special syntax for conversions that made you be explicit about them, just like explicit conversions, but would only allow conversions that are "implicit"/safe. But there isn't :( For better or worse, we have what we have in the language, but people (including me) have gotten used to what we have so I still feel like there should be consistency instead of banishing certain language features that we don't like for new code, even though they're used all over the place in existing code and will always be as they'll always be implicit conversions for the builtin types and other existing types that have them, and they'll always be this weird inconsistency that makes people stop and wonder why it's there. I just associate explicit conversions with conversions that aren't safe, because if they were safe, they would be implicit - that's the way it has always been (apart from that one mistake of int and float).

@Neme12
Copy link

Neme12 commented Mar 13, 2024

If this is really the decision for all conversions to be explicit going forward regardless of whether they're safe or not, please, at least add doc comments and documentation pages for those conversions saying whether they are actually safe or not.

@tannergooding
Copy link
Member

Wait, uh? 😟 I assumed until now that in a checked context, if I cast a numeric type and the value can't fit into the new type, it throws. Now I could have bugs in my code I guess :/ But thanks for letting me know.

Checked has always really pertained to overflow/underflow and not necessarily towards "representable". The simplest example is that checked(5 / 2) does not throw, it simply returns 2 even though the actual answer of 2.5 is not representable.

Likewise checked((float)double.MaxValue) does not throw because the specification requires it take the value as given, perform the operation as if to infinite precision and unbounded range, and then round to the nearest representable result. For float, this happens to be PositiveInfinity which is a representable value and therefore it does not throw.

floating-point to integer conversions do throw for checked if the value can't be represented, as that would overflow. Integer to floating-point conversions do not, even though many inputs will result in a loss of precision.

This seems like an argument against all implicit conversions altogether.

In some ways, yes. There are many languages that explicitly do not provide implicit conversions because of these issues.

But the language has them and people are used to them. So it seems weird that some numeric types would have them and others would not, for a reason that applies to all of them.

Yes, and so our decision on whether to use implicit conversions or not is based around the likelihood people will run into issues/pits of failure.

There are many cases where implicit conversions are good and where we would expose them for new types; this just doesn't happen to be one of them due to it being a more esoteric user-defined type that needs to interplay with multiple built-in types (which have special conversion precedence rules) and being used in scenarios where a new overload causing a silent loss of precision could be both easily missed and have a large negative impact were it to make it production.

That is to say, we don't only make the decision to expose implicit conversions based on whether or not something is lossless. We have to also account for how that is likely to be used or impact other existing overloads, especially for more common types, and how likely it is to be exposed as an overload for those other types. This case has both of those as fairly likely, especially in domains where the combination of perf and precision are often competing with eachother.

We can always expose the implicit conversions later given enough feedback, but we can't take them away once they are exposed. So defaulting to explicit here is the better/safer option and won't be overly negative, particularly given the primary domains are going to involve using vectors and require explicit conversions anyways.

@Neme12
Copy link

Neme12 commented Mar 13, 2024

Checked has always really pertained to overflow/underflow and not necessarily towards "representable". The simplest example is that checked(5 / 2) does not throw, it simply returns 2 even though the actual answer of 2.5 is not representable.

Right, but I would consider converting an int that's outside of the range of short, to short, to be an overflow. Isn't it?

The simplest example is that checked(5 / 2) does not throw

I guess I wouldn't consider that to be an overflow, I wouldn't expect that to throw as that's what integer division is defined as. But I would consider casting a double to a float that's too large for a float to be a kind of overflow.

But thanks for letting me know about the semantics (or lack of thereof) of checked and floating point numbers. I guess I have to be careful and write my own utilities for floating point conversions that are really actually checked.

EDIT:
Oh, float.CreateChecked isn't checked either 😲 damn.

@tannergooding
Copy link
Member

I guess I wouldn't consider that to be an overflow, I wouldn't expect that to throw as that's what integer division is defined as. But I would consider casting a double to a float that's too large for a float to be a kind of overflow.

Of sorts, but that's the intent of PositiveInfinity and NegativeInfinity. They exist to represent a value that overflowed past the finite range. Its overall more performant, avoids needing to check every single operation while still propagating the relevant information such that checking once at the end of the algorithm is typically sufficient instead. And most importantly, it allows float/double to represent values and arithmetic operations that are critical for scientific applications, games, machine learning, and in general higher level mathematics. -- NaN and Negative Zero exist for much the same reason, to represent values that escape the "real" number domain or which round towards zero, but may have actually been less than Epsilon.

It really just falls out that there is no value that can overflow, because its always representable as infinity, which is unlike integers which can only represent finite values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants