Add F4E2M1FN and F8E8M0FNU types #19096

sergey-kozub · 2024-11-06T09:05:30Z

This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented.

This will enable using microscaling (MX) formats (RFC), such as MXFP4.

F4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5

F8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111

Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1

Related PRs:

The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied.

sergey-kozub · 2024-11-06T09:31:46Z

Note: clang-format check fails on issues unrelated to this PR.

Also, running with clang-format 18.1.8 yields no issues.

reedwm · 2024-11-20T21:46:45Z

Sorry for the delay in reviewing.

You plan on adding the other MX types, right? If so, and convinient for you, could you add them in the same PR? Normally smaller PRs are better, but all these dtypes touch many of the same places in the code, so it's easier to batch review all of them at once. Granted, FP6 might be a bit weird if packed since their size doesn't divide the byte size -- those potentially should be in a separate PR.

But if adding the other MX types in inconvenient for you, happy to review this PR as-is.

sergey-kozub · 2024-11-21T15:04:55Z

You plan on adding the other MX types, right? If so, and convinient for you, could you add them in the same PR?

I'll add E8M0 implementation to this PR, as you suggest.
The FP6 types are not implemented yet, I'll look into that later (not a priority).

sergey-kozub · 2024-11-21T21:10:06Z

Added the F8E8M0FNU support patch to this PR. Will update the description soon after.

reedwm

Thanks for the PR!

Note I didn't review the MLIR codegen files (those in gpu/fusions/transforms) since I'm not familiar with those. @mooskagh can you find someone to review these files?

xla/hlo/builder/lib/math_test.cc

xla/literal.h

xla/literal_comparison_test.cc

xla/primitive_util.h

xla/python/ifrt/dtype.proto

xla/service/elemental_ir_emitter.cc

third_party/tsl/third_party/py/ml_dtypes/e8m0.patch

xla/tests/convert_test.cc

xla/service/gpu/fusions/transforms/tests/expand_float_ops.mlir

pifon2a · 2024-11-25T08:44:22Z

xla/service/gpu/fusions/transforms/lower_tensors.cc

could you add to lower_tensors.mlir tests for the new 4-bit type similar to the existing tests for int4?

func.func @transfer_write_i4 func.func @transfer_read_i4 func.func @int4_constant

Added "f4_constant", "transfer_read_f4" and "transfer_write_f4".
BTW, there's no "transfer_read_i4" in that file.

sergey-kozub · 2024-11-27T20:17:21Z

I believe I addressed all the prior comments. PTAL when possible.

pifon2a

Thank you!

mooskagh · 2024-12-03T12:03:10Z

UPD: Disregard the comment, found it.

I'm trying to merge this PR.
It references float8.h in ml_dtypes.h, but I cannot find it either in this repository, nor internally (it was deleted a year ago).

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- fa539fbde987ff6421fd2937fade495baf633630 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 2c014035923e0394b2cfcb81eaf090a96621b0aa by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- e919ed54e825f2e905aaf0cc279dd21cd80f1ce9 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- ca16839096feb93e0454ec380c5c707c30199346 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- eedc079ca9a4db9e611d84877a25b3da21386f16 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 8e0305cd47002f0c1f8668a3cbcbce5428f2a4c6 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- aabe9c68d964609f78f29e17ee0680798ad0c6ac by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- 87da2ebfab388f113482e852009401a9e416974a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- e0ee48c3a37018ba985c850931592d62eadf7c2e by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- be2e457922e2cddeaf5aca13dd022f3ac2a1393b by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments Merging this change closes #19096 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 be2e457922e2cddeaf5aca13dd022f3ac2a1393b PiperOrigin-RevId: 702273510

Imported from GitHub PR #19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- fa539fb by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 2c01403 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- e919ed5 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- ca16839 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- eedc079 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 8e0305c by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- aabe9c6 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- 87da2eb by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- e0ee48c by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- be2e457 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments Merging this change closes #19096 FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 be2e457 PiperOrigin-RevId: 702273510

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- fa539fbde987ff6421fd2937fade495baf633630 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 2c014035923e0394b2cfcb81eaf090a96621b0aa by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- e919ed54e825f2e905aaf0cc279dd21cd80f1ce9 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- ca16839096feb93e0454ec380c5c707c30199346 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- eedc079ca9a4db9e611d84877a25b3da21386f16 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 8e0305cd47002f0c1f8668a3cbcbce5428f2a4c6 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- aabe9c68d964609f78f29e17ee0680798ad0c6ac by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- 87da2ebfab388f113482e852009401a9e416974a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- e0ee48c3a37018ba985c850931592d62eadf7c2e by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- be2e457922e2cddeaf5aca13dd022f3ac2a1393b by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments Merging this change closes #19096 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 be2e457922e2cddeaf5aca13dd022f3ac2a1393b PiperOrigin-RevId: 702273510

Imported from GitHub PR #19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- f493e48 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 87d0056 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- 70ca820 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- c479f09 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- daaa3af by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 1f0e19f by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- 999bf96 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- d7d5af7 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- 9e8c7bc by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- 1e34417 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments -- d4de0a3 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments (round 2) Merging this change closes #19096 FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3 PiperOrigin-RevId: 707638099

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- 70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- 999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- 9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- 1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments -- d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments (round 2) Merging this change closes #19096 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 707638099

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - #181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - #166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- 70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- 999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- 9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- 1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments -- d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments (round 2) Merging this change closes #19096 PiperOrigin-RevId: 708390061

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- 70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- 999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- 9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- 1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments -- d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments (round 2) Merging this change closes #19096 PiperOrigin-RevId: 708390061

We tried to pretty-print the name of the type but this is not possible if the element_type is not in the enum. Print the underlying integer instead. FUTURE_COPYBARA_INTEGRATE_REVIEW=#19096 from openxla:skozub/e2m1 d4de0a3 PiperOrigin-RevId: 708391290

We tried to pretty-print the name of the type but this is not possible if the element_type is not in the enum. Print the underlying integer instead. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 708391290

… in an AsyncOp pair. Also add a sync flag in its output. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 694744312

Do not enable it for now. There are some numerical issues. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 704016458

…List. * We protect is_fully_addressable_, addressable_device_list_, memory_kind_info_ and hash_ with the PyDeviceList object's associated lock. * DefaultMemoryKind and MemoryKinds are update to be static methods that take a Python object reference, so we have easy access to that lock. * We change a number of other methods to be private. * We move the module registration function into a static method so it can access private methods more easily. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 708383435

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#19096 from openxla:skozub/e2m1 d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 PiperOrigin-RevId: 680743400

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c F4E2M1FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.0 - Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0 - Min normal number: S.01.0 = ±2^(0) = ±1.0 - Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5 F8E8M0FNU - Exponent bias: 127 - Maximum stored exponent value: 254 (binary 1111'1110) - Maximum unbiased exponent value: 254 - 127 = 127 - Minimum stored exponent value: 0 (binary 0000'0000) - Minimum unbiased exponent value: 0 − 127 = -127 - Doesn't have zero - Doesn't have infinity - NaN is encoded as binary 1111'1111 Additional details: - Zeros cannot be represented - Negative values cannot be represented - Mantissa is always 1 ``` Related PRs: - openxla/stablehlo#2582 - jax-ml/ml_dtypes#181 - llvm/llvm-project#95392 - llvm/llvm-project#108877 - jax-ml/ml_dtypes#166 - llvm/llvm-project#107127 - llvm/llvm-project#111028 The PR is split into multiple commits just to make the review easier, it is possible that some tests could fail if only some (i.e. not all) of these commits are applied. Copybara import of the project: -- f493e4803eaa5ff3da3ceb130e9348c014b4a2e8 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: import mxfloat.h -- 87d005630b310a355d7c30b22828c35237373f17 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: primitive type -- 70ca82093faeec98f2dc5e8b82f617d99ca96849 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: literal support -- c479f0940da490e9668e2f48e14a7466f0c4a97f by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: conversion codegen -- daaa3af3ce3af456f2ef44dbc291ebeb09e86d9b by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: python interface -- 1f0e19ff14733eff790726936b68ef0cf607a766 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: FFI -- 999bf96092e57c7b3039811f2887281f347ff17a by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: HLO evaluator -- d7d5af74c5f8a94522779a121c0a4a962156fb64 by Sergey Kozub <skozub@nvidia.com>: Add F4E2M1FN type: add tests -- 9e8c7bc02849f241d0f05941221d99f1d08d9e67 by Sergey Kozub <skozub@nvidia.com>: Add F8E8M0FNU type -- 1e344174b931cea4978770ab740dfed67186c2f4 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments -- d4de0a369d9dc853f34f3cf3bf7dcc5a47502106 by Sergey Kozub <skozub@nvidia.com>: Addressing PR#19096 review comments (round 2) Merging this change closes #19096 PiperOrigin-RevId: 708390061

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c... PiperOrigin-RevId: 709118438

Imported from GitHub PR openxla/xla#19096 This PR adds F4E2M1FN primitive type (4-bit float with 2 bits exponent and 1 bit mantissa), F8E8M0FNU primitive type (8-bit float with 8 bits exponent, no mantissa and no sign) and enables loads/stores in the same way S4/U4 type is implemented. This will enable using microscaling (MX) formats ([RFC](openxla/xla#18085)), such as MXFP4. ```c... PiperOrigin-RevId: 709153611

github-actions · 2024-12-23T22:38:38Z

This PR was rolled back in e19d979!

sergey-kozub requested review from reedwm and jreiffers November 6, 2024 09:05

sergey-kozub self-assigned this Nov 6, 2024

sergey-kozub force-pushed the skozub/e2m1 branch 4 times, most recently from 7a4377a to 29d4c58 Compare November 6, 2024 09:27

dimitar-asenov requested a review from mooskagh November 13, 2024 08:56

sergey-kozub changed the title ~~Add F4E2M1FN type~~ Add F4E2M1FN and F8E8M0FNU types Nov 21, 2024

reedwm requested changes Nov 23, 2024

View reviewed changes

mooskagh requested a review from pifon2a November 25, 2024 07:51

pifon2a suggested changes Nov 25, 2024

View reviewed changes

sergey-kozub force-pushed the skozub/e2m1 branch 3 times, most recently from 6972698 to b20cfd2 Compare November 27, 2024 20:15

pifon2a approved these changes Dec 2, 2024

View reviewed changes

reedwm approved these changes Dec 3, 2024

View reviewed changes

sergey-kozub force-pushed the skozub/e2m1 branch 2 times, most recently from 3d63bba to be2e457 Compare December 3, 2024 07:49

copybara-service bot mentioned this pull request Dec 3, 2024

PR #19096: Add F4E2M1FN and F8E8M0FNU types google/tsl#2976

Open

copybara-service bot mentioned this pull request Dec 3, 2024

PR #19096: Add F4E2M1FN and F8E8M0FNU types #20080

Open

copybara-service bot closed this in 2533c35 Dec 20, 2024

copybara-service bot mentioned this pull request Dec 20, 2024

[XLA] Fix ShapeError crashes when element_type is not in the enum #20786

Open

copybara-service bot mentioned this pull request Dec 20, 2024

[XLA] Fix ShapeError crashes when element_type is not in the enum tensorflow/tensorflow#83489

Draft

copybara-service bot mentioned this pull request Dec 20, 2024

Converts WindowPrefetch as a single operation, instead of wrapping it in an AsyncOp pair. Also add a sync flag in its output. tensorflow/tensorflow#83360

Draft

copybara-service bot mentioned this pull request Dec 20, 2024

[XLA:GPU][Emitters] Add codegen for sorted scatters. tensorflow/tensorflow#83281

Merged

copybara-service bot mentioned this pull request Dec 20, 2024

[XLA:Python] Add locking around lazily-initialized fields in PyDeviceList. tensorflow/tensorflow#83488

Merged

copybara-service bot mentioned this pull request Dec 20, 2024

Add output streaming for CostValue. tensorflow/tensorflow#83312

Merged

wenscarl mentioned this pull request Dec 20, 2024

Add float8_e8m0fnu type support jax-ml/jax#25116

Open

copybara-service bot mentioned this pull request Dec 23, 2024

PR #19096: Add F4E2M1FN and F8E8M0FNU types jax-ml/ml_dtypes#246

Merged

copybara-service bot mentioned this pull request Dec 23, 2024

PR #19096: Add F4E2M1FN and F8E8M0FNU types google/tsl#3051

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add F4E2M1FN and F8E8M0FNU types #19096

Add F4E2M1FN and F8E8M0FNU types #19096

sergey-kozub commented Nov 6, 2024 •

edited

Loading

sergey-kozub commented Nov 6, 2024 •

edited

Loading

reedwm commented Nov 20, 2024

sergey-kozub commented Nov 21, 2024

sergey-kozub commented Nov 21, 2024

reedwm left a comment

pifon2a Nov 25, 2024

sergey-kozub Nov 27, 2024

sergey-kozub commented Nov 27, 2024

pifon2a left a comment

mooskagh commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 23, 2024

Add F4E2M1FN and F8E8M0FNU types #19096

Add F4E2M1FN and F8E8M0FNU types #19096

Conversation

sergey-kozub commented Nov 6, 2024 • edited Loading

sergey-kozub commented Nov 6, 2024 • edited Loading

reedwm commented Nov 20, 2024

sergey-kozub commented Nov 21, 2024

sergey-kozub commented Nov 21, 2024

reedwm left a comment

Choose a reason for hiding this comment

pifon2a Nov 25, 2024

Choose a reason for hiding this comment

sergey-kozub Nov 27, 2024

Choose a reason for hiding this comment

sergey-kozub commented Nov 27, 2024

pifon2a left a comment

Choose a reason for hiding this comment

mooskagh commented Dec 3, 2024 • edited Loading

github-actions bot commented Dec 23, 2024

sergey-kozub commented Nov 6, 2024 •

edited

Loading

sergey-kozub commented Nov 6, 2024 •

edited

Loading

mooskagh commented Dec 3, 2024 •

edited

Loading