Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebAssembly Relaxed SIMD #651

Open
dtig opened this issue Jun 15, 2022 · 10 comments · May be fixed by #657
Open

WebAssembly Relaxed SIMD #651

dtig opened this issue Jun 15, 2022 · 10 comments · May be fixed by #657

Comments

@dtig
Copy link

dtig commented Jun 15, 2022

Request for Mozilla Position on an Emerging Web Specification

Other information

None

@martinthomson
Copy link
Member

Hi @dtig, I don't know how to find a specification from the links you have provided. The "modified specification" link returns 404. Is there an explanation the proposal you can provide?

cc @lars-t-hansen

@dtig
Copy link
Author

dtig commented Jun 16, 2022

@martinthomson The directory structure makes these docs hard to find, these are the ones I should have probably linked:
Proposal URL (Overview): https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md
Implementation status (Probably doesn't contain the most recent updates): https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/ImplementationStatus.md

@martinthomson
Copy link
Member

@dtig, thanks for that, I updated the request.

I don't see any discussion of how this might result in fingerprinting of users based on the answers provided varying between CPUs. It's possible that the variation between CPU-level implementation is such that fingerprinting information is highly correlated with other fingerprinting surfaces, but some analysis supporting that theory would be good to have.

We've taken significant steps to remove fingerprinting surfaces like this, so I'd like to see what our experts in the area think (cc @tomrittervg).

@dtig
Copy link
Author

dtig commented Jun 16, 2022

There is some discussion in this issue: WebAssembly/relaxed-simd#11. We believe that the entropy exposed by this proposal correlates with existing finger printing surfaces on the web. Detailed analysis for each of the operations with included instruction lowering will be available in the associated issues/PRs (open and closed) for the relevant operations, a recent example is the Relaxed Rounding Q-format Multiplication issue WebAssembly/relaxed-simd#40.

@lars-t-hansen
Copy link
Contributor

@martinthomson, I'm no longer with Mozilla, try @eqrion.

@tomrittervg
Copy link

So a concrete leak exposed by wasm today is distinguishing x86/x86-64 via this trick (and possibly others). This comment expresses the belief Intel/AMD can be distinguished.

There seems to be consensus to not provide an API to determine hardware capabilities, and also that there won't be any capabilities exposed that only work on certain processors. However it seems there will be instructions exposed that map to different assembly instructions (or behaviors) on different processors, and that could be probed some times. Ref

Individual instruction proposals have fingerprinting details. This one, sure enough, talks about how it can distinguish presence of an instruction set and how the wasm engine can work around it at a perf cost. This one and this one add new x86/ARM distinguishers, as does this one (and also distinguishes POWER), this one distinguishes another instruction set (FMA), and this one distinguishes x86/ARM as well as AVX512 presence. This one is kind of ambiguous but mentions detecting SSE4.1.

x86/ARM is apparently already exposed. I don't know of formalized instruction set exposure through any existing fingerprintable surface. Not in the way this would expose a clear 'yes/no'. Intel/AMD differences could likely be exposed via timing attacks (and WebGL debug info tells you the vendor of the graphics card...) but I tend to put timing attacks on hardware behavior in a different class of fingerprinting protection than 'get a direct answer to a direct question.'

Overall I get the impression that it is possible to hide the behavioral differences, it just introduces performance penalties. No Mozilla hat in this comment, but I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

Related work in the fingerprinting space would be

  • WebAudio output which relied on OS math libraries (which distinguishes OS as well as OS version sometimes) and I believe can fall through to hardware in some cases distinguishing that, but (from my comment above) I don't think this has ever been clearly root caused. This was mitigated in Chrome and is in slow progress (volunteers welcome) in Firefox by using explicit math routines. Tor Browser blocks web audio.
  • Javascript Math routines which have the same problem as Web Audio, but Firefox has fixed this one.
  • Canvas rendering which exposes hardware rendering behavior. This can be mitigated by software rendering based on our experiments, and this work is in (slower) progress. Other techniques include introducing slight per-site randomness to the output so cross-site tracking doesn't produce identical fingerprints (Brave) - against a naive attacker this works, but I am unsure it will stand up to someone who attempts to circumvent it. Tor Browser just blocks canvas unless you authorize it with a permission prompt.

@dtig
Copy link
Author

dtig commented Jun 17, 2022

So a concrete leak exposed by wasm today is distinguishing x86/x86-64 via this trick (and possibly others). This comment expresses the belief Intel/AMD can be distinguished.

There seems to be consensus to not provide an API to determine hardware capabilities, and also that there won't be any capabilities exposed that only work on certain processors. However it seems there will be instructions exposed that map to different assembly instructions (or behaviors) on different processors, and that could be probed some times. Ref

Individual instruction proposals have fingerprinting details. This one, sure enough, talks about how it can distinguish presence of an instruction set and how the wasm engine can work around it at a perf cost. This one and this one add new x86/ARM distinguishers, as does this one (and also distinguishes POWER), this one distinguishes another instruction set (FMA), and this one distinguishes x86/ARM as well as AVX512 presence. This one is kind of ambiguous but mentions detecting SSE4.1.

x86/ARM is apparently already exposed. I don't know of formalized instruction set exposure through any existing fingerprintable surface. Not in the way this would expose a clear 'yes/no'. Intel/AMD differences could likely be exposed via timing attacks (and WebGL debug info tells you the vendor of the graphics card...) but I tend to put timing attacks on hardware behavior in a different class of fingerprinting protection than 'get a direct answer to a direct question.'

Overall I get the impression that it is possible to hide the behavioral differences, it just introduces performance penalties. No Mozilla hat in this comment, but I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

The Wasm specification is usually at a lower level than that, for example, the spec formally specifies the execution semantics of instructions, but doesn't make assumptions about behavior on underlying hardware. For Relaxed SIMD specifically, specifying this is still under discussion, but the feedback from the Community group has been to ensure that the spec doesn't introduce correlations between instructions to avoid relying on detection patterns (more about that here).

The instruction lowerings in the linked issues are a recommendation on what is possible on different hardware, and these would all be allowed by the spec, when lowered to the Wasm Simd operations for example, the behavior should be deterministic across different hardware (with the exception of IEEE754 non-compliant hardware). Engine implementations are able to make the tradeoff between performance, and to pick the lowering (or a different that isn't already described in the related issue) that matches the entropy requirements for their platforms. So it's not necessarily hiding the differences, more that the slower, more deterministic instructions are also allowed results by the spec. While the spec is at too low level to provide that guidance, the issues linked above do provide that guidance.

Related work in the fingerprinting space would be

  • WebAudio output which relied on OS math libraries (which distinguishes OS as well as OS version sometimes) and I believe can fall through to hardware in some cases distinguishing that, but (from my comment above) I don't think this has ever been clearly root caused. This was mitigated in Chrome and is in slow progress (volunteers welcome) in Firefox by using explicit math routines. Tor Browser blocks web audio.
  • Javascript Math routines which have the same problem as Web Audio, but Firefox has fixed this one.
  • Canvas rendering which exposes hardware rendering behavior. This can be mitigated by software rendering based on our experiments, and this work is in (slower) progress. Other techniques include introducing slight per-site randomness to the output so cross-site tracking doesn't produce identical fingerprints (Brave) - against a naive attacker this works, but I am unsure it will stand up to someone who attempts to circumvent it. Tor Browser just blocks canvas unless you authorize it with a permission prompt.

@lars-t-hansen
Copy link
Contributor

I would prefer to see a line drawn in the sand at x86/ARM distinguishing. That is, a specification would require implementors to fix up outputs such that processor manufacturer and instruction set presence cannot be distinguished in the output (I'm not counting timing here.) At a minimum I think a specification must support implementors doing that, and provide guidance for how.

@tomrittervg, I think this means that an implementation must, on at least some of its supported platforms, implement every floating point addition or subtraction (and probably other operations) whose output's bit pattern can be inspected at some point in the future, in a way that produces a bit pattern that is platform-invariant. Operations whose results have simple lifetimes and are consumed in easy-to-understand ways could be exempted from this laundering by the optimizer, but I would still expect a very large fraction of FP ops to produce results that would have to be laundered, likely resulting in a very significant performance drop for FP intensive code. This does not seem related to SIMD or Relaxed SIMD at all, it's a core Wasm concern.

See https://github.com/WebAssembly/design/blob/main/Nondeterminism.md for more about this.

@eqrion
Copy link
Contributor

eqrion commented Jun 17, 2022

Hi Deepti, thanks for posting this!

Speaking here for the WebAssembly team in SpiderMonkey, we view this proposal as worth-prototyping.

This proposal adds several performance-oriented instructions to WebAssembly in order to aid porting high performance native code to the web. This fits well with our vision for WebAssembly's evolution.

There are two risks here:

  1. This proposal adds limited local non-determinism in order to allow instructions to have efficient encodings on all platforms. The non-determinism is mostly around behavior in edge cases, such as floating point NaN. If we're not careful this could be a new fingerprinting dimension. This is mitigated by the proposal allowing implementations to have deterministic behavior for these instructions on all platforms if they don't want to leak any information. We've intended for our implementation to only leak x86[_64] vs ARM64, a fact that is already leaked by WebAssembly as noted above.
  2. The goal of this proposal is improved performance at the cost of increased complexity to the specification and implementations due to new instructions and new sources of non-determinism. We believe this can be worth it, but I would like to have confirmation that there are users who will use this proposal and what benefits they are seeing.

We are engaged in the specification process for this proposal in the WebAssembly CG and will require the above two risks to be resolved in order to advance the proposal.

eqrion added a commit to eqrion/standards-positions that referenced this issue Jun 22, 2022
@eqrion eqrion linked a pull request Jun 22, 2022 that will close this issue
@dtig
Copy link
Author

dtig commented Oct 10, 2022

Thanks for your reply!

Hi Deepti, thanks for posting this!

Speaking here for the WebAssembly team in SpiderMonkey, we view this proposal as worth-prototyping.

This proposal adds several performance-oriented instructions to WebAssembly in order to aid porting high performance native code to the web. This fits well with our vision for WebAssembly's evolution.

There are two risks here:

  1. This proposal adds limited local non-determinism in order to allow instructions to have efficient encodings on all platforms. The non-determinism is mostly around behavior in edge cases, such as floating point NaN. If we're not careful this could be a new fingerprinting dimension. This is mitigated by the proposal allowing implementations to have deterministic behavior for these instructions on all platforms if they don't want to leak any information. We've intended for our implementation to only leak x86[_64] vs ARM64, a fact that is already leaked by WebAssembly as noted above.

Most instructions only expose x86/ARM-Neon, I've tried to encapsulate this into a separate document in this PR. The one detail that the proposal as it stands now leaks is the availability of native FMA support, which has significant performance wins, documented here. There is some unresolved discussion of adding both a deterministic FMA, as well as a QFMA operation that might mitigate this somewhat - more discussion here.

  1. The goal of this proposal is improved performance at the cost of increased complexity to the specification and implementations due to new instructions and new sources of non-determinism. We believe this can be worth it, but I would like to have confirmation that there are users who will use this proposal and what benefits they are seeing.

There's performance data either measured or estimated included in the issue instructions are proposed, as well as the QFMA PR linked above. One note is that from an implementation perspective this proposal doesn't introduce implementation complexity to instructions/Spec, but it does introduce a precedent to be less strict about the specification. I expect the complexity here is for applications or libraries using the proposal, and potential compat issues for issues if the instructions are used incorrectly. This is somewhat mitigated by the fact that tools will not generate relaxed-simd operations by default (compared to fixed-width SIMD instructions which can be generated by the auto vectorizer), and are available using a special intrinsics hearder, or potentially a compiler flag, so the proposal already assumes a higher threshold of knowledge to be able to use the proposed operations.

We are engaged in the specification process for this proposal in the WebAssembly CG and will require the above two risks to be resolved in order to advance the proposal.

eqrion added a commit to eqrion/standards-positions that referenced this issue Sep 26, 2023
@zcorpan zcorpan changed the title Request for position: WebAssembly Relaxed SIMD WebAssembly Relaxed SIMD Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Unscreened
Development

Successfully merging a pull request may close this issue.

5 participants