Usage of `mul_add` #475

mo8it · 2024-02-26T14:51:48Z

Is there a reason why mul_add is not used more often in glam? An example is

Lines 659 to 662 in 1ea8163

    
           w0 * x1 + x0 * w1 + y0 * z1 - z0 * y1, 
        
           w0 * y1 - x0 * z1 + y0 * w1 + z0 * x1, 
        
           w0 * z1 + x0 * y1 - y0 * x1 + z0 * w1, 
        
           w0 * w1 - x0 * x1 - y0 * y1 - z0 * z1,

Using mul_add would improve the accuracy and performance (performance on most platforms, see docs)

Are you open to PRs that utilize mul_add?

The text was updated successfully, but these errors were encountered:

bitshifter · 2024-03-03T10:30:32Z

It's unfortunately not as clear a win as it might seem on the surface. From the docs:

Using mul_add may be more performant than an unfused multiply-add if the target architecture has a dedicated fma CPU instruction. However, this is not always true, and will be heavily dependant on designing algorithms with specific target hardware in mind.

FMA is not enabled by default (on x86_64 at least), it needs to be enabled by users (e.g. with -C target-feature=+fma). If FMA is not enabled using mul_add will be considerably slower - you can see the different asm generated here https://rust.godbolt.org/z/1WEd9hqTP. If FMA is not available there will be jump to an IEEE-754 conformant implementation that probably looks something like this https://github.com/rust-lang/libm/blob/master/src/math/fmaf.rs.

Possibly glam could add an internal conditional mul_add which uses FMA if it is enabled but falls back to an unfused (a * b) + c if FMA is not enabled. However this would mean glam would give different results depending if it was compiled with or without FMA. This would be a problem for users that are interested in determinism (for example if some supported architectures had FMA and some didn't), so it would be necessary to add a feature to disable glam's conditional FMA in the name of determinism. It's possible, but it gets a little complicated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of `mul_add` #475

Usage of `mul_add` #475

mo8it commented Feb 26, 2024

bitshifter commented Mar 3, 2024 •

edited

Loading

Usage of mul_add #475

Usage of mul_add #475

Comments

mo8it commented Feb 26, 2024

bitshifter commented Mar 3, 2024 • edited Loading

Usage of `mul_add` #475

Usage of `mul_add` #475

bitshifter commented Mar 3, 2024 •

edited

Loading