Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage experimental Extended Multiplication WAsm SIMD instructions #1202

Closed
wants to merge 1 commit into from

Conversation

copybara-service[bot]
Copy link
Contributor

Leverage experimental Extended Multiplication WAsm SIMD instructions

const v128_t vq31prod01 = wasm_i64x2_shl(vprod01, 1);
const v128_t vq31prod23 = wasm_i64x2_add(vprod23, vprod23);
const v128_t vq31prod45 = wasm_i64x2_shl(vprod45, 1);
const v128_t vq31prod67 = wasm_i64x2_add(vprod67, vprod67);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maratyszcza what benefit do you get out of using a wasm_i64x2_shl(...,1) over adding it twice? Is it because of the latency and the ALU?

Copy link
Contributor

@Maratyszcza Maratyszcza Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More even distribution of uops across execution ports and shorter average latency. Older x86 CPUs (e.g. pre-Nehalem and Atom uarchs) have high latency for [V]PADDQ, so it pays off to replace additions with shifts. However, CPUs often have fewer SIMD shift ports than SIMD ALU ports, so we don't want to replace all additions with shifts.

@copybara-service copybara-service bot closed this Sep 20, 2022
@copybara-service copybara-service bot deleted the test_345735031 branch September 20, 2022 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants