Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxed SIMD #1401

Closed
ngzhian opened this issue Mar 1, 2021 · 41 comments
Closed

Relaxed SIMD #1401

ngzhian opened this issue Mar 1, 2021 · 41 comments

Comments

@ngzhian
Copy link
Member

ngzhian commented Mar 1, 2021

Relaxed SIMD adds a set of useful instructions that introduce local non-determinism where the results of the instructions may vary based on hardware support.

The SIMD proposal focuses on getting a set of SIMD instructions that will speed up real world use cases while staying true to the deterministic core of the language. However, there are instructions that can unlock even more performance, but due to the architecture-dependent semantics, were not included. These instructions include:

  • Fused Multiply Add (single rounding if hardware supports it, double rounding if not)
  • Approximate reciprocal/reciprocal sqrt
  • Relaxed Swizzle (implementation defined out of bounds behavior)
  • Relaxed Rounding Q-format Multiplication (optional saturation)

These instructions have been suggested in multiple places: FMA (1, 2, 3), approximate reciprocal/reciprocal sqrt (1). Such instructions have also been mentioned as part of future features.

There is a soft dependency on feature-detection proposal, which will allow code to determine if certain instructions are supported by the hardware and these instructions can be safely relied on.

Non-determinism: The non-determinism in this proposal is limited to the result of an individual instruction and and is consistent across runs. There are no global control or flags involved. This means that given the following pseudocode:

w = fma(x, y, z)

w can have different values depending on available hardware support. Multiple usages of the instruction will return the same result w, so the instruction is internally consistent.

Initial prototypes indicate performance improvement of ~30% on modern CPU architectures. The alternative, which is to provide a deterministic FMA result using emulation, will be too slow to be of any use.

Potential extension: introduce a relaxed mode for existing SIMD instructions. Such a mode would be tied to the feature-detection proposal, where if relaxed-mode is supported, the existing SIMD instructions will be have non-deterministic behavior, e.g. NaN canonicalization, FP IEEE compliance modes used by developers (e.g. no-honor-inf, no-signed-zeros, no-trapping-math..) .

Keywords (for SEO): Fast SIMD

Co-champions: Marat Dukhan (@Maratyszcza) and Zhi An Ng (@ngzhian)

@arunetm
Copy link
Contributor

arunetm commented Mar 1, 2021

Thanks @ngzhian @Maratyszcza. This is super-useful.

It will be helpful if instructions as part of this extension are not limited to this top 5. We had evaluated Float min/max, float-int conversions, etc to be able to offer significant performance gains depending on platforms. Having these as new instruction variants than a relaxed-simd-mode extension (mentioned above) will help to minimize potential non-determinism introduced by engines.
Beyond nan-propagation, there are a few other scenarios that may benefit from relaxed semantics, especially when the FP IEEE compliance modes are used by developers (e.g. no-honor-inf, no-signed-zeros, no-trapping-math..) It will be useful if we can consider those under this umbrella as well.

@ngzhian
Copy link
Member Author

ngzhian commented Mar 1, 2021

It will be helpful if instructions as part of this extension are not limited to this top 5.

Yes, that list is not exhaustive, I hope we can come up with more when we eventually have a repo (similar to how we proposed and merged instructions for SIMD).

Beyond nan-propagation, there are a few other scenarios that may benefit from relaxed semantics, especially when the FP IEEE compliance modes are used by developers (e.g. no-honor-inf, no-signed-zeros, no-trapping-math..) It will be useful if we can consider those under this umbrella as well.

Good idea, I'll add this snippet to the description, thanks.

@arunetm
Copy link
Contributor

arunetm commented Mar 1, 2021

Thanks. Related: #1393 (comment)

I hope we can come up with more when we eventually have a repo

Is there a near-term plan to have the CG phase 0/1 poll?

@ngzhian
Copy link
Member Author

ngzhian commented Mar 1, 2021

Is there a near-term plan to have the CG phase 0/1 poll?
Yes, once we get more comments, perhaps we can poll at the March 16 CG meeting.

@sunfishcode
Copy link
Member

A tricky thing about Relaxed SIMD and related operators is the meaning of "consistent across runs". It's obviously valuable for a "program" to be able to assume that rounding is consistent across a "run", to avoid discontinuities etc., but in wasm in general, it's not always clear what the scope of a "run" is, for example for long-lived suspended and resumed instances, or instance graphs distributed across multiple underlying hosts. I have an idea for how to avoid this, and I'm curious what others think:

Introduce a new opaque type, fpenv, representing information about the host floating-point environment, which would be passed to operators like qfma as an operand. This could evolve in various ways in the future, but for now, it would just include flags like "is fma fast?".

Initially, the only way to obtain an fpenv value would be to import a global variable with type fpenv:

   (global $fp (import "host.fp" "default") fpenv)
   
   (func $foo (param f32) (param f32) (param f32) (result f32)
     local.get 0
     local.get 1
     local.get 2
     global.get $fp
     f32.qfma
   )

(The names "host.fp" and "default" here are just for illustration; this is something we'd need to figure out.)

Instead of saying qfma itself is nondeterministic, we'd say the value of the fpenv you import is nondeterministic. That would make the "same across a run" property explicit in the code. If one wants to guarantee that two instances have the same value (eg. a main program instance and a library instance), one module could import it and then re-export it, and the other could import it from the first (toolchains could arrange this automatically).

In typical implementations, fpenv wouldn't be a dynamic value. Many hosts would have only one possible fpenv value, so all instructions that interact with fpenv values could be no-ops.

A downside is wasm code size; the import and global.get instructions take some space. However, the impact should be relatively small, since instructions that need an fpenv value are likely to be relatively infrequent in whole modules.

@tlively
Copy link
Member

tlively commented Mar 3, 2021

It's obviously valuable for a "program" to be able to assume that rounding is consistent across a "run", to avoid discontinuities etc.

I agree with this, but I'm not convinced it's worth guaranteeing this kind of strong internal consistency in the spec. We could leave it up to distributed and migratory engines to choose for themselves whether to provide this guarantee, and engines that are not distributed or migratory would be able to provide it without doing anything extra. The purpose of these instructions is to dramatically improve performance, but engines with strong determinism requirements won't be able to realize any performance improvement from them. In practice, I expect engines with such determinism requirements to not want to implement these instructions to begin with, so introducing extra complexity to make them deterministic would not be worth it.

@sunfishcode
Copy link
Member

The purpose of these instructions is to dramatically improve performance, but engines with strong determinism requirements won't be able to realize any performance improvement from them. In practice, I expect engines with such determinism requirements to not want to implement these instructions to begin with, so introducing extra complexity to make them deterministic would not be worth it.

FMA seems to be a counterexample to this; it's available on lots of CPUs these days (and that link is out of date; there are many more now), so for example, many server environments today can comfortably depend on having FMA. It has wide applicability, including in domains where both determinism and distributed compute are important (linear algebra over tiled datasets). And it delivers major speedups (the ~30% number mentioned above).

@ngzhian
Copy link
Member Author

ngzhian commented Mar 3, 2021

It's obviously valuable for a "program" to be able to assume that rounding is consistent across...

The non-determinism in this proposal is limited to the result of an individual instruction and and is consistent across runs...

Are there cases where, if the instruction is not "consistent across runs" (for some definition), it will still be useful, and not confusing?

I'm thinking of defining it like:

fma(x, y, z) = oneof { round(x + y * z), round(round(x+y) * z) }

The determinism is local, and for this particular example there are always only 2 possible cases. But it isn't consistent, in that if you have the distributed execution, one host can choose to do a single rounding and another host can choose to do two roundings.

If the application is robust to such differences, then they get the most performance out of this.

@sunfishcode
Copy link
Member

Are there cases where, if the instruction is not "consistent across runs" (for some definition), it will still be useful, and not confusing?

There surely do exist applications that are robust to FMA rounding differently each time it's invoked. But there are also applications where absolute precision isn't important, but avoiding discontinuities is. For example, in many graphics programs, it may not be noticeable if the position or color of a particular object is slightly different from what a fully precise computation might show, but it may be noticeable if there's a bump in a line or a boundary in a gradient.

I've seen GPU drivers split FMAs into discrete multiplies and adds in places where they can fit just the multiply or the add into the hardware pipeline in a particular place in the code, and I've seen actual applications be broken as a result. If we want consistency, we should say so in the spec.

@tlively
Copy link
Member

tlively commented Mar 3, 2021

What if we solve the problem in a different direction by instead clarifying what we mean by "run" in the spec? We could add language to the effect of "FMA may have this or that behavior, but an instance may not observe both behaviors and all instances in a store must observe the same behavior." Again, for non-distributed, non-migratory engines, this imposes no additional burden. Distributed/migratory engines would still have to do whatever they would do with the explicit fpenv, but from the producer and spec perspective, this would be simpler.

@sunfishcode
Copy link
Member

What if we solve the problem in a different direction by instead clarifying what we mean by "run" in the spec? We could add language to the effect of "FMA may have this or that behavior, but an instance may not observe both behaviors and all instances in a store must observe the same behavior." Again, for non-distributed, non-migratory engines, this imposes no additional burden. Distributed/migratory engines would still have to do whatever they would do with the explicit fpenv, but from the producer and spec perspective, this would be simpler.

The store is abstract, just a way for the spec to talk about entities that can be linked. Can a store span suspend/resume cycles? Can it span networks? The spec just says if you can link exports to imports, it's a store, which gives embedders a lot of flexibility. If we start using the store to hold miscellaneous configuration information, it would give the store an identity, making these questions more complex.

The point of fpenv is that engines can't infer it on their own. Producers know their intent, such as "this library needs the same floating-point configuration as the program it's linked to", and fpenv lets the state their intent.

@tlively
Copy link
Member

tlively commented Mar 3, 2021

I see your point about the store being an imperfect abstraction for this use case. If an engine snapshots an instance's memory and reinstantiates the module with that snapshotted memory on a different machine, it seems that you could call that a different store but still break the program if FMA semantics change.

I guess I'm not convinced that letting producers express fine-grained intent here is useful enough to be worth this extra complexity, though. I can see that a use case could be constructed, but I also know that the users I work with who really want to use FMA won't need this. I'm of course biased, though, because I only work closely with Web users, for whom this wouldn't make a difference. Do you know of users who are eager for this fine-grained control? If there are none now, perhaps we could either make no guarantees or find a way to specify coarse-grained guarantees about semantic consistency in this proposal. Assuming there are no users now, I would be happy to revisit this and do another version with finer-grained control in the future when there are users ready to take advantage of that.

@sunfishcode
Copy link
Member

One of the interesting properties of wasm is its virtualizability. All interactions with the outside world go through imports and exports, so it's straightforward to have WebAssembly instances with no concept of "the machine I'm on" as a distinct identity they can interact with. With WASI, we're working on an ecosystem where programs don't know about "the filesystem namespace of the machine I'm on" or "the local network configuration of the machine I'm on" (substitute in "the container I'm in" or "the VM I'm in" as needed ;-)).

This means "the machine I'm on" could change over time, and "the machine I'm on" could be different from "the machine other instances I'm linked to are on". We can do this, without prearranged configuration, because the relationships between instances are completely described in their imports and exports. I myself and others are building on systems which will take advantage of this property in general.

There are cases where we can get by without this property. For example with FMA, if we're ok limiting our scope to just servers, then we can maybe just depend on FMA being available everywhere. However, on one hand, I can't guarantee we'll always limit our scope to just servers. And on the other, the proposal here already has more than just FMA, and more things may be added in the future.

I understand this is adding complexity up front, but my concern is that if we don't preserve this property of wasm, it will be difficult to re-introduce. And it doesn't seem like it's that complex for producers to produce or for minimal consumers to ignore.

@RossTate
Copy link

RossTate commented Mar 3, 2021

How is this addressed for NaNs in the f32/f64 operations?

@sunfishcode
Copy link
Member

The bits of a NaN produced by an f32/f64 operation are simply nondeterministic, within a few constraints, so they don't imply any state. And in practice, it's rare for programs to care about the bits of NaNs in ways that aren't covered by the constraints.

@tlively
Copy link
Member

tlively commented Mar 4, 2021

@sunfishcode, if we did add an fpenv mechanism for guaranteeing determinism, I don't think it would make this proposed instruction set more attractive to producers targeting heterogeneous runtimes. For code that works in the presence of nondeterministic inconsistent FMA it does not matter whether we have an fpenv mechanism. Code that requires determinism consistency within a run, on the other hand, would never want to use an FMA in an environment that may not consistently support it natively due to the expense of emulating it. Such code would have more predictable performance by using unfused multiply and adds, which are already expressible without this proposal.

You're right that Wasm is generally meant to be deterministic in all the ways that matter and should be easily virtualizable. But this proposal is explicitly meant to be an exception to that rule, which is why it was split out of the SIMD proposal. I would not expect heterogenous runtimes that want to provide determinism to implement this proposal at all. Code that wants to run on runtimes that provide this proposal as well as runtimes that do not provide this proposal should use the feature detection proposal to accomplish that.

@RossTate
Copy link

RossTate commented Mar 4, 2021

Well, the NaN bits are formalized to be non-deterministic, but in practice they're platform-specific. I agree that it's rare for programs to care about the bits of NaNs beyond what is specified, but I wouldn't be surprised if some programs are implicitly relying on the fact that those bits at least behave the same "across runs".

This is not to dismiss the concern @sunfishcode is raising. I just wonder if the scope of the concern is broader than this proposal. If it is, then for virtualizing engines, I wonder if what would make sense is to, when compiling a module, track any platform-specific behaviors it has and then, when instantiating a module, record those specifics of the current platform and make sure to only move instances to platforms with the same specifics. (Or to restrict the code to a platform-independent subset of wasm.) Thoughs?

@sunfishcode
Copy link
Member

@tlively I outlined a use case above which is ok with FMA being nondeterministic, but which needs FMA to be consistent within a run. Discontinuities break real applications, even when full precision and full determinism aren't needed.

Is fpenv really so burdensome? It should be mechanical for many producer scenarios, and should be ignorable in many consumers.

@RossTate Re: NaNs: With NaNs we can at least tell developers "don't do that", whereas the FMA situations discussed here occur in common usage. And in some use cases, we can bypass the NaN problem using wasm engine flags to canonicalize NaNs. In all, the story indeed isn't completely watertight, but it's a fairly small problem in practice.

Re: Tracking platform-specific behaviors: Yes, we can avoid breaking an instance using special tracking and maintaining extra state, but it doesn't address the linking issue, where we want to know up front if two instances need to see the same behavior.

@tlively
Copy link
Member

tlively commented Mar 4, 2021

@sunfishcode, sorry, in previous comment I used "nondeterministic" and "deterministic" where I really meant "inconsistent within a run" and "consistent within a run." I updated it, so now it hopefully makes more sense. It's not that fpenv is particularly burdensome, but in my view solving the problem it solves is a non-goal for this proposal and I don't see how it will add value for the population of users/programs who I expect to use this proposal. Since the root of our disagreement appears to be different expectations for the goals of the proposal, how about we continue this line of discussion at the next CG meeting where we can have a more fleshed out conversation about goals and non-goals. I want to leave room on this issue for folks to raise other topics as well :)

@dtig
Copy link
Member

dtig commented Mar 4, 2021

As a thought exercise, if we were to go the route of including the fpenv mechanism, I'm trying to figure out what other ways it would be useful. I'm interested in the idea of abstracting host environment variables into a global - aside from is fma fast flag, what other useful information can we encode? When we were first talking about post-MVP Simd features, an idea that was attractive to me was a backward compatible flag to loosen some semantics of the existing MVP operations for performance, and I could see fpenv being useful for that purpose.

That said, what would it mean for Approximate reciprocal/reciprocal sqrt instructions? Unless I'm missing something, these will give slightly different results on different architectures and it doesn't seem preventable with fpenv?

@sunfishcode
Copy link
Member

@dtig We might think of fpenv as a kind of reference to a floating-point co-processor. An fpenv operand in a floating-point instruction would be the co-processor to compute the result with. It'd be nondeterministic which algorithms the co-processor uses, within some constraints that we'd design, but a given co-processor would always be consistent with itself.

@Maratyszcza
Copy link

I'd like to point out that we can't fully encapsulate non-determinism inside fpenv variable: some of the instructions being considered for relaxed SIMD (e.g. Approximate reciprocal/reciprocal sqrt) map to architecturally underspecified instructions (e.g. RSQRTPS/RCPPS on x86 SSE). We can, however, specify a deterministic fallback for each Relaxed SIMD instruction, expressed through SIMD MVP instructions. Then implementations where non-determinism is a great concern could use deterministic lowering, at the cost of some performance.

@sunfishcode
Copy link
Member

sunfishcode commented Mar 5, 2021

fpenv is just about giving applications results that are consistent within a run, so it's fine with RCPPS, RSQRTPS, etc. being underspecified.

@Maratyszcza
Copy link

If a run involves moving execution to a different machine, approximation reciprocal instructions would produce different results if they lower to RCPPS/RSQRTPS

@RossTate
Copy link

RossTate commented Mar 5, 2021

I’m still not sure fpenv needs to be an explicit import to address @sunfishcode’s needs, but if it does then I would suggest that it not be represented as a first-class value.

@sunfishcode
Copy link
Member

@Maratyszcza It isn't about guaranteeing that we can always migrate, but about making application requirements explicit.

@RossTate Any ideas you have for other ways to solve the problem are welcome.

@conrad-watt
Copy link
Contributor

@sunfishcode at the extreme, we could document something equivalent to fpenv in the semantic definitions of each of the relaxed SIMD operations, without requiring it to be represented in the program code. Abstractly, it might sit in the instance, but I'm sure we could bikeshed. It would be the responsibility of the host to define how their fpenv is configured (which could still be in a programmatic way at instantiation-time, if necessary).

@tlively
Copy link
Member

tlively commented Mar 5, 2021

I think using an import also suffers from the same problem as trying to specify consistency in terms of the store; the consistency of the identity of the imported value depends on the same underlying notion of the lifetime of "a run." In the worst case, a spec-compliant engine could be created that had a different value for the import every time a host function calls into the instance. On each host-to-wasm call, this engine would actually reinstantiate the module with a different import value and replay the execution trace on the new instance before making the call. Obviously this would be an outlandish thing to do, but it demonstrates that using an import on its own does not provide any strong formal guarantees about internal consistency from the program's point of view.

The best we can do is to tie the internal consistency guarantee to the lifetime of something that already exists in the spec, which would probably be an instance. Then it would be up to the engines to clarify what they consider to be the lifetime of an instance. The outlandish engine above would document that instances only live as long as it takes the module to return from a host-to-wasm call, and more realistic engines would document that instance identity does not change when an instance is migrated to a different machine.

If we want programs to be able to opt in or out of internal consistency, it would be simplest to provide two versions of each instruction (or a discriminating immediate) to allow producers to express that directly. No need for an import or anything else. (We should still discuss later whether that should be a goal of this proposal, though.)

@RossTate
Copy link

RossTate commented Mar 5, 2021

It’s an interesting problem. I think I know various semantic techniques to formalize possible semantics, so I’m more interested in the pragmatics. For that, it would be helpful to understand the scenario in more depth. @sunfishcode, could you give a more detailed example of both something you want to allow and something you want to prevent, and then could you explain how you see type imports as helping distinguish the two cases? That’d give me a lot more to work with and offer suggestions for (next week, when I have access to a computer again).

@sunfishcode
Copy link
Member

@tlively Instantiation is an observable event, from the perspective of a program. In general, hosts can't arbitrarily tear down and re-instantiate without consequences, including potentially losing data. I'm ok if programs can't rely on consistent floating-point nondeterminism across their instances being town down and re-instantiated.

@RossTate Here's a summary: Some applications are ok if certain floating-point operators are nondeterministic, but they may rely on the operators being deterministic "at runtime", so that they don't see sudden discontinuities. We want a way to say that all the instances that make up a "run of a program" see the same behavior as each other. Imports are a way to express this: we can have one instance import an fpenv from the environment and re-export it, and the other instances could then import it from the first instance, so they all get the same behavior (without affecting any other instances not linked to them).

Thinking this through more, I have now also thought of a way to do this without first-class values. The idea of "intrinsic functions" for wasm has come up several times, but it's never been clear what the difference between an "instruction" and an "intrinsic function" is. Perhaps nondeterminism is now such a difference. If we made qfma, qrcp, qrsqrt be imported intrinsic functions, with reserved names so engines could recognize and inline them, and required them to have deterministic behavior at runtime, we could also re-export and import them in other instances to ensure other instances see the same functions.

@RossTate
Copy link

Thanks for the info, @sunfishcode!

If one were to try to enforce this statically, I don't think imports would be enough. Consider that a module can import multiple fpenvs (once you can import one, module composition requires you to be able to import multiple). The module might only use one of these fpenvs in its code, but from the module's signature you have no way of knowing which import it's using.

For what you're describing, I think you'd need an extension like an effect system. The effect of platform-specific instructions would indicate the specific fpenv that the instruction used. To keep this tractable, fpenvs would need to be second-class (otherwise you're getting into dependently typed effect systems). Function signature types would similarly be extended to say which fpenv they use (and, to make things tractable for engines, you can make a restriction that at most one fpenv is present at a time). Then type-checking would ensure that fpenv-sensitive functions only call other fpenv-sensitive functions that use the same fpenv, and similarly that instances importing/exporting fpenv-sensitive functions only get linked to instances using the same fpenv.

Does that make sense?

@sunfishcode
Copy link
Member

@RossTate I left out a little too much detail in my summary :-}. We don't need mechanisms that fail type checking if the program uses multiple fpenv values, within a module or even within a single function. A function using two fpenv values conceptually just means that it may perform two independent bodies of floating-point work that don't need to be consistent with each other. In practice, this would probably be very uncommon, but we don't need to prohibit it.

What we need is just a way for one piece of code to ask to be consistent with another piece of code. Whether we import globals or functions, imports give us that: the way a piece of code asks to be consistent with another piece of code is to use the same imported value. Imports can also be re-exported to other modules to provide the same value across module boundaries.

@tlively
Copy link
Member

tlively commented Mar 11, 2021

@sunfishcode, how would you deal with this point that @Maratyszcza raised?

I'd like to point out that we can't fully encapsulate non-determinism inside fpenv variable: some of the instructions being considered for relaxed SIMD (e.g. Approximate reciprocal/reciprocal sqrt) map to architecturally underspecified instructions (e.g. RSQRTPS/RCPPS on x86 SSE).

@sunfishcode
Copy link
Member

@tlively It isn't about guaranteeing that we can always migrate, but about giving code a way to state its needs.

If module A imports, uses, and re-exports an fpenv, and B module imports it from A and uses it, this says that whatever fpenv A gets, B will get the same one. That may mean that I as a host have fewer options than I would if B didn't have that import, but that's fine. The key thing is that I as a host know that they need to be the same, so I can avoid breaking them.

@RossTate
Copy link

@sunfishcode I'm worried that you are expecting imports/exports to be able to express more than they can (or at least than they can express easily). Could you give an example of a multi-instance program that should validate (because of consistent use of fpenv) and an example of a multi-instance program that should not validate (because of inconsistent use of fpenv), and illustrate the use of type imports causes one to validate and the other to be rejected?

@tlively
Copy link
Member

tlively commented Mar 11, 2021

@RossTate, I don't think that using the "wrong" fpenv is supposed to be a validation error, but it might allow the program to observe inconsistent semantics that it was not expecting or otherwise it might over-constrain a distributed host.

@sunfishcode
Copy link
Member

sunfishcode commented Mar 11, 2021

@RossTate There is no example of a program that should not validate :-). This isn't about catching programs using accidentally inconsistent fpenv values. It's about letting programs that need consistent computation request it.

Imagine assigning every possible floating-point reciprocal algorithm a unique integer value. With that, fpenv is just an integer identifying a particular algorithm. Importing and exporting an fpenv is just sharing an integer value available across module boundaries. Reciprocal-approximation would then just be a pure function of its operands, which happen to be 1 floating-point value and one "integer" value. If you compute reciprocal-approximation twice, with the same operand values, you get the same result.

So there's no magic here and no invalidation. It works like plain values.

@RossTate
Copy link

Oh, okay. Sorry, you had said you needed to know required compatibilities "up front", which threw me off.

What you're describing then seems to need to be able affect compilation (unless you want these to actually compile to function calls), so would fpenv be "pre"-imported then?

@Maratyszcza
Copy link

Imagine assigning every possible floating-point reciprocal algorithm a unique integer value. With that, fpenv is just an integer identifying a particular algorithm. Importing and exporting an fpenv is just sharing an integer value available across module boundaries. Reciprocal-approximation would then just be a pure function of its operands, which happen to be 1 floating-point value and one "integer" value. If you compute reciprocal-approximation twice, with the same operand values, you get the same result.

Reciprocal algorithms implemented in hardware have very little limitations imposed on them (only maximum relative error on x86). They are lookup tables hard-coded in processor implementation. There is no API to get the parameters of these tables, so the only way to emulate them is to dump 16GB representing 4-byte output for each 4-byte input.

@sunfishcode
Copy link
Member

@Maratyszcza Yes, that's correct. Fortunately though, I'm not looking to emulate anything here. I'm primarily looking to give applications a way to state their needs so that I can avoid doing things that would break them.

@RossTate In most implementations, the idea is that there is only one possible fpenv value, so the knowledge of which fpenv is in use never affects any decision such an engine would make. So I think a regular import would be sufficient for the foreseeable future.

If wasm added a pre-import mechanism, we could switch to using it for fpenv, and there might be some theoretical advantages, but I don't think we need it for the main use cases here.

@ngzhian
Copy link
Member Author

ngzhian commented Mar 16, 2021

Relaxed SIMD moved to phase 1 at the CG meeting earlier today, we have our own repository at https://github.com/WebAssembly/relaxed-simd, please file issues and continue discussions there.

I filed WebAssembly/relaxed-simd#1 to capture what I think is the main discussion here, around "consistency" and the various suggestions on how to specify that.

Thank you everyone for all the discussion, and I look forward to more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants