Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aobench: cannot use sleef-sys #118

Closed
GabrielMajeri opened this issue Sep 1, 2018 · 6 comments
Closed

aobench: cannot use sleef-sys #118

GabrielMajeri opened this issue Sep 1, 2018 · 6 comments

Comments

@GabrielMajeri
Copy link
Contributor

Performance profiling revealed the crate was spending 50% of its time inside libm.so, and I decided to try using sleef to see if it would help. Unfortunately, it seemed it wasn't working, and the crate was still using libm's functions.

The benchmark's README recommends building with target-cpu=native. However, sleef-sys has a build file which checks for the presence of certain target-features, and ignores target-cpu.

So I kept target-cpu=native, and also added -C target-feature=sse,sse2,ssse3,sse4.1,avx,avx2,fma (all of my CPU's features, I think).

But then it failed to build:

error[E0463]: can't find crate for `sleef_sys`
   --> /data/Development/Rust/packed_simd/src/lib.rs:261:1
    |
261 | extern crate sleef_sys;
    | ^^^^^^^^^^^^^^^^^^^^^^^ can't find crate

It seems the cfg(all(target_feature = "sse2")) in packed_simd's Cargo.toml wasn't working. So I removed it and now it tried to build libsleef.so:

unsupported target: "x86_64-unknown-linux-gnu" features: "{"rdrand", "popcnt", "lzcnt", "xsaveopt", "xsavec", "rdseed", "bmi2", "xsave", "bmi1", "fxsr", "xsaves", "mmx"}"

So what are the step-by-step instructions for using sleef with this crate?

@GabrielMajeri
Copy link
Contributor Author

GabrielMajeri commented Sep 1, 2018

I've managed to halve the runtime of aobench --algo vector, by writing some quick & dirty sin/cos approximations with polynomials. I really hope we can get sleef to work, and that it provides good performance.

If I force ISPC to link against libm by adding --math-lib=system, it's as slow as the Rust code using libm.

Very noisy image generated by my sincos approximation:
image_vector

Benchmark results:

rust (with libm): 1134ms
rust (with hacky sincos): 508 ms
ispc (with libm): 1078ms
ispc (with their optimized math lib): 431ms

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

and I decided to try using sleef to see if it would help. Unfortunately, it seemed it wasn't working,

Which platform are you on ? can you perform the cargo build with the -vv option and dump the output into a gist ?

The benchmark's README recommends building with target-cpu=native. However, sleef-sys has a build file which checks for the presence of certain target-features, and ignores target-cpu.

When using target-cpu=native cargo translates (or should translate, I think I checked this with @alexcrichton ) that to all target-feature flags of the CPU, so that in build.rs one only has to detect target features. If this isn't working, it is a cargo bug.

Note that we are not passing any target features to the C compiler. To do this we would have to upgrade the cc crate with support for target-features and target-cpu, and then propagate that to the cmake-rs crate.

So what are the step-by-step instructions for using sleef with this crate?

cargo build --features=sleef on the platforms that sleef support. I've had no problems with this on x86_64-unknown-linux-gnu and x86_64-apple-darwin, and on CI linux and windows are tested with this as well: https://github.com/rust-lang-nursery/packed_simd/blob/master/ci/run.sh#L74

I don't recall why macosx is not tested (there is a comment, but its not helpful...) - maybe an oversight.

Building and linking C libraries is brittle, so if your system differs slightly, things might fail.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

and that it provides good performance.

IIRC the benchmarks results in the readme of aobench for the AVX1 machine are with sleef enabled, and it provides way better performance, but still 1.5x slower than ISPC on that machine.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

What I wanted to do at some point is enable cross-language inlining for the sleef-sys crate, so that the functions can be inlined into rust. @michaelwoerister wrote instructions about how to do that here: rust-lang/rust#53031 (comment)

@GabrielMajeri
Copy link
Contributor Author

@gnzlbg I did manage to get sleef to build, but I've discovered the bigger issue.

With sleef-sys, while the sincos function is no longer an issue.

I had modified the benchmark to use fma in some places, and instead of using AVX2's fma, it now used sleef's fma, which for me means worse performance with sleef enabled
(Which is what made me thought there was an issue with the way I was building sleef)

I will work on solving #112 and #116 before continuing work on this benchmark.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 3, 2018

it now used sleef's fma, which for me means worse performance with sleef enabled

oh damn, I only tested the sleef functions in one system, and there they always deliver a significant improvement, but on this system the llvm.fma intrinsics is really slow and I should have measured that the performance was the same as llvm.fmuladd. I think we can open a new issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants