Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steer round to even to Z's hardware operation (for Z), and MLIR/LLVM roundEven for the other platforms. #2970

Merged
merged 30 commits into from
Oct 10, 2024

Conversation

AlexandreEichenberger
Copy link
Collaborator

Round to nearest even was currently implemented in software:

  // Use numpy algorithm for rint as follows.
  // ```
  // double y, r;
  // y = npy_floor(x);
  // r = x - y;
  //
  // if (r > 0.5) {
  //     y += 1.0;
  // }
  //
  // /* Round to nearest even */
  // if (r == 0.5) {
  //     r = y - 2.0*npy_floor(0.5*y);
  //     if (r == 1.0) {
  //         y += 1.0;
  //     }
  // }

This can be replaced by one instruction on many architectures (e.g. on Z by FIEBRA or VFISB). This PR does that for z machines.

For others, MLIR & LLVM actually implement a math::round_even that leads to efficient instructions as well. At this time, this is not enabled for Z, thus the direct use of ASM for the Z platform. If the ASM cannot be used on Z, then the math::round_even_emulation (algorithm above) performs better than the default LLVM function call backup. For other machines, the LLVM round even is used.

Another issue with using ASM driven directive is that we cannot use the LLVM features that let us used SIMD vectors of arbitrary lengths. I baked this support in the ONNX to Krnl lowering using vector::ShapeCastOp and then extract and insert. That same approach can be used elswhere. I checked the HW that the generated shuffle are eventually all gone (as we shuffle entire vectors). Right now, it is handled in the ONNX to KRNL. It could be migrated to Krnl to LLVM, but would require more work that I am not too familiar with.

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Copy link
Collaborator

@tungld tungld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

output2D = create.vec.insertInto2D(subOutput, output2D, i);
}
return create.vec.shapeCast(vecType, output2D);
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else looks redundant since we have return inside the if.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tx, will fix

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
@AlexandreEichenberger AlexandreEichenberger merged commit 90aea66 into onnx:main Oct 10, 2024
7 checks passed
@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15816 [push] Steer round to even to Z... started at 07:39

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14846 [push] Steer round to even to Z... started at 08:51

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15819 [push] Steer round to even to Z... started at 08:39

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15816 [push] Steer round to even to Z... passed after 1 hr 13 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15819 [push] Steer round to even to Z... passed after 1 hr 35 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14846 [push] Steer round to even to Z... passed after 2 hr 15 min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants