Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LLVM speed regression #651

Closed
wants to merge 7 commits into from
Closed

Fix LLVM speed regression #651

wants to merge 7 commits into from

Conversation

syrusakbary
Copy link
Member

When canonicalizing all the NaNs with the LLVM backend, the performance was slowed down 2x.

This PR fixes that by reverting the canonicalization

@nlewycky

@syrusakbary
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Aug 9, 2019
@syrusakbary
Copy link
Member Author

Previous timing:

wasmer run --backend=llvm --disable-cache --enable-simd   18.95s user 0.08s system 99% cpu 19.114 total

With the changes from this PR:

/Users/syrusakbary/Development/wasmer/target/release/wasmer run --backend=llv  14.21s user 0.03s system 99% cpu 14.263 total

@nlewycky
Copy link
Contributor

bors try

bors bot added a commit that referenced this pull request Aug 13, 2019
@@ -399,6 +408,31 @@ fn canonicalize_nans(
canonicalized
}

// Replaces any NaN with the canonical QNaN, otherwise leaves the value alone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style idea: maybe consider the opposite name: make_canonical_nan.

@penzn
Copy link

penzn commented Aug 16, 2019

Hi, I have reported the regression originally in wasmerio/c-wasm-simd128-example#1. If I reset master to commit right before 11f66d2 then I see the timings from the article, but timings from the branch in this PR is still similar to master. Is that me or are you seeing that too?

@Hywan
Copy link
Contributor

Hywan commented Oct 31, 2019

@syrusakbary What's the status of this PR?

bors bot added a commit that referenced this pull request Nov 26, 2019
934: In LLVM backend, track which floats are guaranteed to be arithmetic, which makes the canonicalization a no-op. r=nlewycky a=nlewycky

# Description
This is a reimplementation of the patch in PR #651.

Extend state.rs ExtraInfo to track more information about floats. In addition to tracking whether the value has a pending canonicalization of NaNs, also track whether the value is known to be arithmetic (which includes infinities, regular values, and non-signalling NaNs (aka. "arithmetic NaNs" in the webassembly spec)). When the value is arithmetic, the correct sequence of operations to canonicalize the value is a no-op. Therefore, we create a lattice where pending+arithmetic=arithmetic.

Also, this extends the tracking to track all values, including non-SIMD integers. That's why there are more places where pending canonicalizations are applied.

Looking at c-wasm-simd128-example, this provides no performance change to the non-SIMD case (takes 58s on my noisy dev machine). The SIMD case drops from 46s to 29s.

# Review

- [ ] Add a short description of the the change to the CHANGELOG.md file


Co-authored-by: Nick Lewycky <nick@wasmer.io>
@nlewycky
Copy link
Contributor

nlewycky commented Dec 2, 2019

This PR is obsoleted by PR #883 and PR #934.

@nlewycky nlewycky closed this Dec 2, 2019
@nlewycky
Copy link
Contributor

nlewycky commented Dec 2, 2019

@penzn Yes, we noticed the same thing. It's a bit of a pain because LLVM doesn't say a lot about what happens with NaN values on operations, even as simple as an fadd instruction. It happened to work before, because it lowers to an addition that happens to preserve NaNs. Unfortunately, while the work I did to try to bring performance back did help on many examples, it hasn't helped to improve the c-wasm-simd128-example when built with SIMD. The primary difference there remains whether the LLVM autovectorizer manages to vectorize the fdiv instruction. That can be affected by many things, we've also changed the LLVM pass list to improve optimizations which should improve the chance that we get the vectorized fdiv in the simd example, but unfortunately we still don't. I'd still like to figure out how to get guaranteed correctness while preserving the performance, especially since it should be possible given that the x86-64 instructions give us what we need.

@penzn
Copy link

penzn commented Dec 3, 2019

@nlewycky I just rebuilt the benchmark with -munimplemented-simd128, which leads to vector fdiv in the wasm module, and performance is the same as with scalar fdiv. It also looks relatively close to native (given that I keep caching on). Can you elaborate a bit on what you are seeing, I'd be happy to help.

Maybe we should open an issue for this.

CC @topperc

@epilys epilys deleted the feature/llvm-nan-fix branch May 4, 2022 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants