Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of mul_add #475

Open
mo8it opened this issue Feb 26, 2024 · 1 comment
Open

Usage of mul_add #475

mo8it opened this issue Feb 26, 2024 · 1 comment

Comments

@mo8it
Copy link

mo8it commented Feb 26, 2024

Is there a reason why mul_add is not used more often in glam? An example is

glam-rs/src/f64/dquat.rs

Lines 659 to 662 in 1ea8163

w0 * x1 + x0 * w1 + y0 * z1 - z0 * y1,
w0 * y1 - x0 * z1 + y0 * w1 + z0 * x1,
w0 * z1 + x0 * y1 - y0 * x1 + z0 * w1,
w0 * w1 - x0 * x1 - y0 * y1 - z0 * z1,

Using mul_add would improve the accuracy and performance (performance on most platforms, see docs)

Are you open to PRs that utilize mul_add?

@bitshifter
Copy link
Owner

bitshifter commented Mar 3, 2024

It's unfortunately not as clear a win as it might seem on the surface. From the docs:

Using mul_add may be more performant than an unfused multiply-add if the target architecture has a dedicated fma CPU instruction. However, this is not always true, and will be heavily dependant on designing algorithms with specific target hardware in mind.

FMA is not enabled by default (on x86_64 at least), it needs to be enabled by users (e.g. with -C target-feature=+fma). If FMA is not enabled using mul_add will be considerably slower - you can see the different asm generated here https://rust.godbolt.org/z/1WEd9hqTP. If FMA is not available there will be jump to an IEEE-754 conformant implementation that probably looks something like this https://github.com/rust-lang/libm/blob/master/src/math/fmaf.rs.

Possibly glam could add an internal conditional mul_add which uses FMA if it is enabled but falls back to an unfused (a * b) + c if FMA is not enabled. However this would mean glam would give different results depending if it was compiled with or without FMA. This would be a problem for users that are interested in determinism (for example if some supported architectures had FMA and some didn't), so it would be necessary to add a feature to disable glam's conditional FMA in the name of determinism. It's possible, but it gets a little complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants