Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sub-byte data types: float4_e2m1fn, float6_e2m3fn, float6_e3m2fn #181

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

sergey-kozub
Copy link
Contributor

This PR adds MX (microscaling) floating point types support.

F4e2m1, F6e2m3, F6e3m2 types are proposed in OpenCompute MX Specification.

These types have the following notable features:

  • No nan encoding, only finite values are supported;
  • No inf encoding, similar to the existing 8-bit types with fn suffix;
  • Sub-byte padded bit encoding, similar to the existing int2 and int4 types.
float4_e2m1fn
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 11 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6
- Min normal number: S.01.0 = ±2^(0) = ±1
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5
float6_e2m3fn
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 11 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.000
- Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5
- Min normal number: S.01.000 = ±2^(0) = ±1
- Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875
- Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125
float6_e3m2fn
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 13 =2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625

Related PRs:

  • PR-95392 [APFloat] Add APFloat support for FP4 data type
  • PR-94735 [APFloat] Add APFloat support for FP6 data types

@sergey-kozub
Copy link
Contributor Author

Note: a small unrelated change in "_finfo.py" removes unreadable boilerplate and replaces it with (faster) dict lookups for instantiating "finfo" objects.

@hawkinsp
Copy link
Collaborator

I'm trying to understand the relationship between these types and the MX types. From my quick read of the MX spec, all of the types it defines are block-scaled formats, which these types are not?

Can you say more about the relationship and the use case for these?

@sergey-kozub
Copy link
Contributor Author

sergey-kozub commented Sep 10, 2024

I'm trying to understand the relationship between these types and the MX types. From my quick read of the MX spec, all of the types it defines are block-scaled formats, which these types are not?

The MXFP8 type is a pair of tensors (e.g., 1st could have the E5M2 type, 2nd - the E8M0 type with 32x less elements).

Proper support of such MX type (where the value has two different primitive types) is way too complicated, but we could instead use two values. This way a dot op with scaled inputs (what we're actually interested in) could be represented as a custom call with four input tensors.

So, in order to implement MXFP8, we need E8M0 primitive type in XLA (and E5M2/E4M3 already exist). For MXFP4, we need both E8M0 and E2M1. Adding FP6 types (E2M3 and E3M2) just for completeness, they are very similar and will unblock us in the future. All of these types are described in the MX spec: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

@sergey-kozub sergey-kozub force-pushed the mxfloat branch 3 times, most recently from e51626b to 9bdf962 Compare September 10, 2024 19:21
README.md Outdated
@@ -66,6 +70,39 @@ A `bfloat16` number is a single-precision float truncated at 16 bits.

Exponent: 8, Mantissa: 7, exponent bias: 127. IEEE 754, with NaN and inf.

### `float4_e2m1`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix names to include fn suffix.

I'm actually having trouble finding a great definition of the f and n suffixes even in the LLVM discussion that added them: I don't suppose you have a link to the definition?

In particular, I'm not sure if n should appear in the name, given that n also appears in the suffix of FP8 types with a single NaN, but these have no NaN. So I'm a bit unclear what the suffix means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f means "finite", n means "special NaN representation" (e.g. non-IEEE)
I saw this somewhere in the comments, will post a link once I find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the type name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually, it's in this same file, below:

F is for "finite" (no infinities), N for with special NaN encoding, UZ for unsigned zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one could say that "no NaN encoding" is a "special NaN encoding".

Also, LLVM APFloat.cpp has these types with "FN" suffix:
https://github.com/llvm/llvm-project/blob/5537ae87b3a87b3abeb4e6983cecd9b103648243/llvm/lib/Support/APFloat.cpp#L150

We could probably change the suffix, but we need to be consistent across the repositories.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should agree with LLVM, so that works for me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you didn't push the fix yet, the headers are still suffix-less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

README.md Outdated
Microscaling format, 4 bits (encoding: `0bSEEM`) using byte storage (higher 4
bits are unused). NaN representation is undefined.

Possible values: [0, 0.5, 1, 1.5, 2, 3, 4, 6]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably stick backticks around the values

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the negative values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added backticks around the values (here and below).

Changed to "Possible absolute values" to keep the list short.

obj.epsneg = 0.125
obj.machep = -3
obj.negep = -3
obj.max = float6_e2m3fn(7.5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally be tempted to specify these as bit patterns (float.fromhex("0x1234.1"), IIRC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -366,6 +423,14 @@ bool Initialize() {
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e4m3fn, float>();
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e5m2, float>();
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e4m3, float>();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting unwieldy. This is just covering all-pairs of extension types, I think? I suspect this can be factored better with some template trickery.

If nothing else, the function you added called RegisterCustomCastsWithBfloat16AndFloat8Types could just be used everywhere here and you call it once for each type?

Probably possible to do better than that with some template cunning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some templates to reduce boilerplate in this file.

ml_dtypes/include/mxfloat.h Show resolved Hide resolved
Copy link
Collaborator

@hawkinsp hawkinsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, other than the clang-format failure.

@copybara-service copybara-service bot merged commit 40e66e5 into jax-ml:main Sep 12, 2024
12 of 13 checks passed
GleasonK pushed a commit to openxla/stablehlo that referenced this pull request Oct 23, 2024
…U) (#2581)

This is a proposal to add MX (microscaling) floating point types to
StableHLO.

Related links:
- StableHLO [PR#2582](#2582)
Add MX floating point types (f4E2M1FN, f6E2M3FN, f6E3M2FN, f8E8M0FNU)
- LLVM [PR#95392](llvm/llvm-project#95392)
[APFloat] Add APFloat support for FP4 data type
- LLVM [PR#94735](llvm/llvm-project#94735)
[APFloat] Add APFloat support for FP6 data types
- LLVM [PR#107127](llvm/llvm-project#107127)
[APFloat] Add APFloat support for E8M0 type
- LLVM [PR#108877](llvm/llvm-project#108877)
[MLIR] Add f4E2M1FN type
- LLVM [PR#107999](llvm/llvm-project#107999)
[MLIR] Add f6E2M3FN type
- LLVM [PR#105573](llvm/llvm-project#105573)
[MLIR] Add f6E3M2FN type
- LLVM [PR#111028](llvm/llvm-project#111028)
[MLIR] Add f8E8M0FNU type
- JAX-ML [PR#181](jax-ml/ml_dtypes#181) Add
sub-byte data types: float4_e2m1fn, float6_e2m3fn, float6_e3m2fn
- JAX-ML [PR#166](jax-ml/ml_dtypes#181) Add
float8_e8m0_fnu (E8M0) OCP MX scale format
copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 3, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
fa539fbde987ff6421fd2937fade495baf633630 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
2c014035923e0394b2cfcb81eaf090a96621b0aa by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
e919ed54e825f2e905aaf0cc279dd21cd80f1ce9 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
ca16839096feb93e0454ec380c5c707c30199346 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
eedc079ca9a4db9e611d84877a25b3da21386f16 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
8e0305cd47002f0c1f8668a3cbcbce5428f2a4c6 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
aabe9c68d964609f78f29e17ee0680798ad0c6ac by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
87da2ebfab388f113482e852009401a9e416974a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
e0ee48c3a37018ba985c850931592d62eadf7c2e by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
be2e457922e2cddeaf5aca13dd022f3ac2a1393b by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 be2e457922e2cddeaf5aca13dd022f3ac2a1393b
PiperOrigin-RevId: 702273510
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 3, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
fa539fb by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
2c01403 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
e919ed5 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
ca16839 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
eedc079 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
8e0305c by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
aabe9c6 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
87da2eb by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
e0ee48c by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
be2e457 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 be2e457
PiperOrigin-RevId: 702273510
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 19, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 19, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 19, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 19, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 19, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 19, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106
PiperOrigin-RevId: 707638099
copybara-service bot pushed a commit that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- #181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- #166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

PiperOrigin-RevId: 708390061
copybara-service bot pushed a commit to tensorflow/mlir-hlo that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

PiperOrigin-RevId: 708390061
copybara-service bot pushed a commit to google/tsl that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

PiperOrigin-RevId: 708390061
copybara-service bot pushed a commit to openxla/xla that referenced this pull request Dec 20, 2024
Imported from GitHub PR #19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e48 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d0056 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca820 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f09 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af7 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e34417 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a3 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3
PiperOrigin-RevId: 708390061
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 20, 2024
Imported from GitHub PR openxla/xla#19096

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4.

```c
F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```

Related PRs:
- openxla/stablehlo#2582
- jax-ml/ml_dtypes#181
- llvm/llvm-project#95392
- llvm/llvm-project#108877
- jax-ml/ml_dtypes#166
- llvm/llvm-project#107127
- llvm/llvm-project#111028

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.
Copybara import of the project:

--
f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: import mxfloat.h

--
87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: primitive type

--
70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: literal support

--
c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: conversion codegen

--
daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: python interface

--
1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: FFI

--
999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: HLO evaluator

--
d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>:

Add F4E2M1FN type: add tests

--
9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>:

Add F8E8M0FNU type

--
1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments

--
d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>:

Addressing PR#19096 review comments (round 2)

Merging this change closes #19096

PiperOrigin-RevId: 708390061
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants