WebAssembly Relaxed SIMD #651

dtig · 2022-06-15T23:27:18Z

Request for Mozilla Position on an Emerging Web Specification

Specification Title: WebAssembly Relaxed SIMD
Specification or proposal URL: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md
Caniuse.com URL (optional):
Bugzilla URL (optional): https://bugzilla.mozilla.org/show_bug.cgi?id=1748807
Mozillians who can provide input (optional): @yurydelendik, @eqrion

Other information

None

martinthomson · 2022-06-16T00:43:51Z

Hi @dtig, I don't know how to find a specification from the links you have provided. The "modified specification" link returns 404. Is there an explanation the proposal you can provide?

cc @lars-t-hansen

dtig · 2022-06-16T01:07:12Z

@martinthomson The directory structure makes these docs hard to find, these are the ones I should have probably linked:
Proposal URL (Overview): https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md
Implementation status (Probably doesn't contain the most recent updates): https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/ImplementationStatus.md

martinthomson · 2022-06-16T01:20:20Z

@dtig, thanks for that, I updated the request.

I don't see any discussion of how this might result in fingerprinting of users based on the answers provided varying between CPUs. It's possible that the variation between CPU-level implementation is such that fingerprinting information is highly correlated with other fingerprinting surfaces, but some analysis supporting that theory would be good to have.

We've taken significant steps to remove fingerprinting surfaces like this, so I'd like to see what our experts in the area think (cc @tomrittervg).

dtig · 2022-06-16T01:53:07Z

There is some discussion in this issue: WebAssembly/relaxed-simd#11. We believe that the entropy exposed by this proposal correlates with existing finger printing surfaces on the web. Detailed analysis for each of the operations with included instruction lowering will be available in the associated issues/PRs (open and closed) for the relevant operations, a recent example is the Relaxed Rounding Q-format Multiplication issue WebAssembly/relaxed-simd#40.

lars-t-hansen · 2022-06-16T05:02:00Z

@martinthomson, I'm no longer with Mozilla, try @eqrion.

tomrittervg · 2022-06-16T16:04:30Z

So a concrete leak exposed by wasm today is distinguishing x86/x86-64 via this trick (and possibly others). This comment expresses the belief Intel/AMD can be distinguished.

There seems to be consensus to not provide an API to determine hardware capabilities, and also that there won't be any capabilities exposed that only work on certain processors. However it seems there will be instructions exposed that map to different assembly instructions (or behaviors) on different processors, and that could be probed some times. Ref

Individual instruction proposals have fingerprinting details. This one, sure enough, talks about how it can distinguish presence of an instruction set and how the wasm engine can work around it at a perf cost. This one and this one add new x86/ARM distinguishers, as does this one (and also distinguishes POWER), this one distinguishes another instruction set (FMA), and this one distinguishes x86/ARM as well as AVX512 presence. This one is kind of ambiguous but mentions detecting SSE4.1.

x86/ARM is apparently already exposed. I don't know of formalized instruction set exposure through any existing fingerprintable surface. Not in the way this would expose a clear 'yes/no'. Intel/AMD differences could likely be exposed via timing attacks (and WebGL debug info tells you the vendor of the graphics card...) but I tend to put timing attacks on hardware behavior in a different class of fingerprinting protection than 'get a direct answer to a direct question.'

Overall I get the impression that it is possible to hide the behavioral differences, it just introduces performance penalties. No Mozilla hat in this comment, but I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

Related work in the fingerprinting space would be

WebAudio output which relied on OS math libraries (which distinguishes OS as well as OS version sometimes) and I believe can fall through to hardware in some cases distinguishing that, but (from my comment above) I don't think this has ever been clearly root caused. This was mitigated in Chrome and is in slow progress (volunteers welcome) in Firefox by using explicit math routines. Tor Browser blocks web audio.
Javascript Math routines which have the same problem as Web Audio, but Firefox has fixed this one.
Canvas rendering which exposes hardware rendering behavior. This can be mitigated by software rendering based on our experiments, and this work is in (slower) progress. Other techniques include introducing slight per-site randomness to the output so cross-site tracking doesn't produce identical fingerprints (Brave) - against a naive attacker this works, but I am unsure it will stand up to someone who attempts to circumvent it. Tor Browser just blocks canvas unless you authorize it with a permission prompt.

dtig · 2022-06-17T00:17:28Z

So a concrete leak exposed by wasm today is distinguishing x86/x86-64 via this trick (and possibly others). This comment expresses the belief Intel/AMD can be distinguished.

There seems to be consensus to not provide an API to determine hardware capabilities, and also that there won't be any capabilities exposed that only work on certain processors. However it seems there will be instructions exposed that map to different assembly instructions (or behaviors) on different processors, and that could be probed some times. Ref

Individual instruction proposals have fingerprinting details. This one, sure enough, talks about how it can distinguish presence of an instruction set and how the wasm engine can work around it at a perf cost. This one and this one add new x86/ARM distinguishers, as does this one (and also distinguishes POWER), this one distinguishes another instruction set (FMA), and this one distinguishes x86/ARM as well as AVX512 presence. This one is kind of ambiguous but mentions detecting SSE4.1.

x86/ARM is apparently already exposed. I don't know of formalized instruction set exposure through any existing fingerprintable surface. Not in the way this would expose a clear 'yes/no'. Intel/AMD differences could likely be exposed via timing attacks (and WebGL debug info tells you the vendor of the graphics card...) but I tend to put timing attacks on hardware behavior in a different class of fingerprinting protection than 'get a direct answer to a direct question.'

Overall I get the impression that it is possible to hide the behavioral differences, it just introduces performance penalties. No Mozilla hat in this comment, but I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

The Wasm specification is usually at a lower level than that, for example, the spec formally specifies the execution semantics of instructions, but doesn't make assumptions about behavior on underlying hardware. For Relaxed SIMD specifically, specifying this is still under discussion, but the feedback from the Community group has been to ensure that the spec doesn't introduce correlations between instructions to avoid relying on detection patterns (more about that here).

The instruction lowerings in the linked issues are a recommendation on what is possible on different hardware, and these would all be allowed by the spec, when lowered to the Wasm Simd operations for example, the behavior should be deterministic across different hardware (with the exception of IEEE754 non-compliant hardware). Engine implementations are able to make the tradeoff between performance, and to pick the lowering (or a different that isn't already described in the related issue) that matches the entropy requirements for their platforms. So it's not necessarily hiding the differences, more that the slower, more deterministic instructions are also allowed results by the spec. While the spec is at too low level to provide that guidance, the issues linked above do provide that guidance.

Related work in the fingerprinting space would be

WebAudio output which relied on OS math libraries (which distinguishes OS as well as OS version sometimes) and I believe can fall through to hardware in some cases distinguishing that, but (from my comment above) I don't think this has ever been clearly root caused. This was mitigated in Chrome and is in slow progress (volunteers welcome) in Firefox by using explicit math routines. Tor Browser blocks web audio.

Javascript Math routines which have the same problem as Web Audio, but Firefox has fixed this one.

Canvas rendering which exposes hardware rendering behavior. This can be mitigated by software rendering based on our experiments, and this work is in (slower) progress. Other techniques include introducing slight per-site randomness to the output so cross-site tracking doesn't produce identical fingerprints (Brave) - against a naive attacker this works, but I am unsure it will stand up to someone who attempts to circumvent it. Tor Browser just blocks canvas unless you authorize it with a permission prompt.

lars-t-hansen · 2022-06-17T13:13:26Z

I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

@tomrittervg, I think this means that an implementation must, on at least some of its supported platforms, implement every floating point addition or subtraction (and probably other operations) whose output's bit pattern can be inspected at some point in the future, in a way that produces a bit pattern that is platform-invariant. Operations whose results have simple lifetimes and are consumed in easy-to-understand ways could be exempted from this laundering by the optimizer, but I would still expect a very large fraction of FP ops to produce results that would have to be laundered, likely resulting in a very significant performance drop for FP intensive code. This does not seem related to SIMD or Relaxed SIMD at all, it's a core Wasm concern.

See https://github.com/WebAssembly/design/blob/main/Nondeterminism.md for more about this.

eqrion · 2022-06-17T13:27:56Z

Hi Deepti, thanks for posting this!

Speaking here for the WebAssembly team in SpiderMonkey, we view this proposal as worth-prototyping.

This proposal adds several performance-oriented instructions to WebAssembly in order to aid porting high performance native code to the web. This fits well with our vision for WebAssembly's evolution.

There are two risks here:

This proposal adds limited local non-determinism in order to allow instructions to have efficient encodings on all platforms. The non-determinism is mostly around behavior in edge cases, such as floating point NaN. If we're not careful this could be a new fingerprinting dimension. This is mitigated by the proposal allowing implementations to have deterministic behavior for these instructions on all platforms if they don't want to leak any information. We've intended for our implementation to only leak x86[_64] vs ARM64, a fact that is already leaked by WebAssembly as noted above.
The goal of this proposal is improved performance at the cost of increased complexity to the specification and implementations due to new instructions and new sources of non-determinism. We believe this can be worth it, but I would like to have confirmation that there are users who will use this proposal and what benefits they are seeing.

We are engaged in the specification process for this proposal in the WebAssembly CG and will require the above two risks to be resolved in order to advance the proposal.

Fixes mozilla#651.

dtig · 2022-10-10T23:32:40Z

Thanks for your reply!

Hi Deepti, thanks for posting this!

Speaking here for the WebAssembly team in SpiderMonkey, we view this proposal as worth-prototyping.

This proposal adds several performance-oriented instructions to WebAssembly in order to aid porting high performance native code to the web. This fits well with our vision for WebAssembly's evolution.

There are two risks here:

This proposal adds limited local non-determinism in order to allow instructions to have efficient encodings on all platforms. The non-determinism is mostly around behavior in edge cases, such as floating point NaN. If we're not careful this could be a new fingerprinting dimension. This is mitigated by the proposal allowing implementations to have deterministic behavior for these instructions on all platforms if they don't want to leak any information. We've intended for our implementation to only leak x86[_64] vs ARM64, a fact that is already leaked by WebAssembly as noted above.

Most instructions only expose x86/ARM-Neon, I've tried to encapsulate this into a separate document in this PR. The one detail that the proposal as it stands now leaks is the availability of native FMA support, which has significant performance wins, documented here. There is some unresolved discussion of adding both a deterministic FMA, as well as a QFMA operation that might mitigate this somewhat - more discussion here.

The goal of this proposal is improved performance at the cost of increased complexity to the specification and implementations due to new instructions and new sources of non-determinism. We believe this can be worth it, but I would like to have confirmation that there are users who will use this proposal and what benefits they are seeing.

There's performance data either measured or estimated included in the issue instructions are proposed, as well as the QFMA PR linked above. One note is that from an implementation perspective this proposal doesn't introduce implementation complexity to instructions/Spec, but it does introduce a precedent to be less strict about the specification. I expect the complexity here is for applications or libraries using the proposal, and potential compat issues for issues if the instructions are used incorrectly. This is somewhat mitigated by the fact that tools will not generate relaxed-simd operations by default (compared to fixed-width SIMD instructions which can be generated by the auto vectorizer), and are available using a special intrinsics hearder, or potentially a compiler flag, so the proposal already assumes a higher threshold of knowledge to be able to use the proposed operations.

We are engaged in the specification process for this proposal in the WebAssembly CG and will require the above two risks to be resolved in order to advance the proposal.

Fixes mozilla#651.

eqrion added a commit to eqrion/standards-positions that referenced this issue Jun 22, 2022

Add WebAssembly relaxed-simd entry

4f1c394

Fixes mozilla#651.

eqrion linked a pull request Jun 22, 2022 that will close this issue

Add WebAssembly relaxed-simd entry #657

Open

eqrion added a commit to eqrion/standards-positions that referenced this issue Sep 26, 2023

Add WebAssembly relaxed-simd position

7bfd074

Fixes mozilla#651.

zcorpan changed the title ~~Request for position: WebAssembly Relaxed SIMD~~ WebAssembly Relaxed SIMD Jul 31, 2024

zcorpan added this to standards-positions review Aug 5, 2024

github-project-automation bot moved this to Unscreened in standards-positions review Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebAssembly Relaxed SIMD #651

WebAssembly Relaxed SIMD #651

dtig commented Jun 15, 2022 •

edited by martinthomson

Loading

martinthomson commented Jun 16, 2022

dtig commented Jun 16, 2022

martinthomson commented Jun 16, 2022

dtig commented Jun 16, 2022

lars-t-hansen commented Jun 16, 2022

tomrittervg commented Jun 16, 2022

dtig commented Jun 17, 2022

lars-t-hansen commented Jun 17, 2022

eqrion commented Jun 17, 2022

dtig commented Oct 10, 2022

WebAssembly Relaxed SIMD #651

WebAssembly Relaxed SIMD #651

Comments

dtig commented Jun 15, 2022 • edited by martinthomson Loading

Request for Mozilla Position on an Emerging Web Specification

Other information

martinthomson commented Jun 16, 2022

dtig commented Jun 16, 2022

martinthomson commented Jun 16, 2022

dtig commented Jun 16, 2022

lars-t-hansen commented Jun 16, 2022

tomrittervg commented Jun 16, 2022

dtig commented Jun 17, 2022

lars-t-hansen commented Jun 17, 2022

eqrion commented Jun 17, 2022

dtig commented Oct 10, 2022

dtig commented Jun 15, 2022 •

edited by martinthomson

Loading