Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fmae which uses fmuladd #116

Merged
merged 1 commit into from
Sep 3, 2018
Merged

Conversation

GabrielMajeri
Copy link
Contributor

@GabrielMajeri GabrielMajeri commented Sep 1, 2018

Fixes #112.

When it comes to performance, the performance in my updated stencil benchmark with avx2 enabled is the same. But when avx2 is disabled, the avx fallback is much faster (before this pull request, I saw some 10x performance reduction on AVX 1).

So there are no downsides to this change, only performance improvements.

Also added some documentation for crate users.

I didn't rename the extern "C" fns, only changed the link_name attribute.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

So there are no downsides to this change,

So this isn't exactly accurate. The reason why, without this change the performance is worse, is IIUC because LLVM has to emulates the infinite precision of the fma instruction on the platforms that do not have it, which is expensive.

First, thank you for doing this, since I mainly test on an AVX1 machine and this is what I was seeing, and now we know that fmuladd is an intrinsic worth adding, since this is not something users can easily implement on their own, and it has a very big impact on the performance on some targets. I think we should do the following:

  • keep ::fma as is, guaranteeing precision - add some tests (e.g. to the ./test directory) that check this (I don't know for which triple of values lower precision is observable, but maybe we can find something).

  • add an fma-estimate method, e.g., ::fmae that uses fmuladd to produce an "estimate" of the infinite precision fma. On platforms that support it, it is exact, but on platforms that do not, it just does an a * b + c instead of trying to emulate the infinite precision.

  • finally: update the aobench and other benchmarks to use ::fmae.

@GabrielMajeri GabrielMajeri changed the title Use fmuladd for fma and document this behavior Add fmae which uses fmuladd Sep 3, 2018
@GabrielMajeri
Copy link
Contributor Author

@gnzlbg I've changed this PR as you said and implemented the fmae function. I haven't added the tests for the existing fma because I don't know which numbers to use when testing either.

src/api/math/float/fmae.rs Outdated Show resolved Hide resolved
/// fused multiply subtract (`self * y - z`).
/// Simply negating the second parameter of this function
/// will make the compiler generate it.
/// While fused-multiply add ([fma]) has infinite precision,
Copy link
Contributor

@gnzlbg gnzlbg Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to put fma in ticks `` for the doc link to work properly.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

I haven't added the tests for the existing fma because I don't know which numbers to use when testing either.

I've been fuzzing fma and a * b + c for a while and I do not manage to get them to produce different results on my machine :/

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

It seems that rustdoc trips on trying to generate the links - let's just remove the [ ] so that we can merge this. I've opened a PR to fix the clippy bot.

@GabrielMajeri
Copy link
Contributor Author

GabrielMajeri commented Sep 3, 2018

@gnzlbg Alright, the links shouldn' be a problem since rustdoc will place these functions one next to another in the end.

I have removed the mention of fmsub from fma's documentation, because it didn't tell the full story anyway. x86 has additional intrinsics for advanced FMA stuff like fmaddsub, masked fma, etc.

LLVM will simply do its best to pick the right instruction, based on how fma is used.

Also, some bikeshedding: Rust's f32 primitive type has a mul_add function which (contrary to its name) will always use FMA (like the current fma function). Any reason for not choosing that name?

@gnzlbg gnzlbg merged commit 3f2d3b2 into rust-lang:master Sep 3, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

Any reason for not choosing that name?

Oversight probably. Please open an issue about this. The floating-point methods are not in the initial round for stabilization so not much discussion has been going on about these.

@GabrielMajeri GabrielMajeri deleted the fmuladd branch September 3, 2018 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants