Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more details in FeatureTest.md #412

Merged
merged 5 commits into from
Oct 16, 2015
Merged

Conversation

lukewagner
Copy link
Member

As requested in spec/#120, this PR adds some more details on how polyfilling and feature testing could work in practice in the MVP.

The [MVP](MVP.md) allows an application to query which post-MVP features are
supported via [`has_feature`](AstSemantics.md#feature-test). This accounts for
the pragmatic reality that features are shipped in different orders at different
times by different engines on the Web.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop "on the Web" since this applies equally to non-Web (node, or any other usage of wasm).

@lukewagner
Copy link
Member Author

Ok, posted update to address most comments.

Feature tests will be available from
[WebAssembly itself](AstSemantics.md#feature-test), as well as from JavaScript.
Since some WebAssembly features add operators and all WebAssembly code in a
module is validated ahead-of-time, the usual JS feature detection pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"JavaScript" :)

@lukewagner
Copy link
Member Author

@jfbastien @mtrofin Some of these ideas sound like they could be followups you could propose. The goal here is just to add some more detail and clarification, not to be the last word on polyfilling. Perhaps you could evaluate this PR just on what it has and then file new PRs to further improve?

@jfbastien
Copy link
Member

lgtm

@ghost
Copy link

ghost commented Oct 16, 2015

  1. Knowledge of which features were probed is valuable input for a cache key. If there are cache-able outputs then enforcing some data flow would be necessary to make this usable. For example a distinct layer-1 stage, for which the probed features are noted, allow the caching of the layer-1 output or downstream output.
  2. This is more effective if the probed 'features' are fine-grain. Perhaps this could be added to the rationale, so that we don't have feature detection functions added that return a bit set of orthogonal features (for which the cache would not know which were actually used, and which the output depends on). Perhaps also encourage developers to not probe for a feature unless the output really depends on it.

@lukewagner
Copy link
Member Author

@JSStats In FF, asm.js code cache entries simply include the buildid in the key (since any new FF binary may change struct layouts baked into the machien code). Assuming has_feature only changes result with new buildids (which should be the case), then no extra addition to the key is needed.

lukewagner added a commit that referenced this pull request Oct 16, 2015
Provide more details in FeatureTest.md
@lukewagner lukewagner merged commit 5970f4e into master Oct 16, 2015
@lukewagner lukewagner deleted the update-future-features branch October 16, 2015 13:30
@ghost
Copy link

ghost commented Oct 16, 2015

@lukewagner has_feature is not defined to only change between runtime builds so the key is needed. The key will also need other inputs such as the negotiated memory size when a fixed size has been requested, so it is natural to also include the probed features.

@lukewagner
Copy link
Member Author

@JSStats I'm talking about the engine-internal code cache.

@ghost
Copy link

ghost commented Oct 18, 2015

@lukewagner I am also 'talking about the engine-internal code cache'. The runtime engine needs to be able to reason about the provenance of the wasm code it is compiling in order to manage this cache. The has_feature results are inputs into layers in the pipeline, for example a layer-1 that emits difference wasm depending on the features, and has_feature is only defined to be static per-instance, not per-runtime-build.

The upper layers need to be deterministic even if user defined so that the runtime will know that the result of decompression does not change if the compressed source does not change, and that the product of the layer-1 code writing does not change if the inputs (including the has_feature results) do not change, and it can reuse cached compiled code without having to re-run the upper layers of the pipeline.

The runtime can note the calls made to has_feature when the upper layers run (on a cache miss), plus other inputs, and add these to the cache key. Only those features probed need be noted, so the cache key can be as general as possible to keep the hit rate as high as possible. By noting the probe features when the upper layers run, the set of features that the code depends on can vary based on the features.

For example, consider two orthogonal features A and B. The code might only need feature B if feature A is available, so it probes A first and only probes B if true. If A is not available then the key need not depend on B.

@lukewagner
Copy link
Member Author

@JSStats The internal cache is keyed on the content, which would be the output of layer 1. has_feature is not going to change for a given buildid. Thus, (content, buildid (, cpuid)) is sufficient.

@ghost
Copy link

ghost commented Oct 20, 2015

@lukewagner Your response misses the key points:

  1. I repeat that has_feature is not defined to be static for a given build ID. I can think of many uses for per-instance constants that the runtime may want to vary. If people want a has_build_feature that is static across instances then add that, but it would be a very narrow feature that is subsumed by more general instant-constants and really is not even worth wasting time on.
  2. We do not want to be re-computing the content input into the final stage, so suggesting that a wasm compiler cache needs the raw input (the output from the upstream layers) condemns wasm to be continually re-applying the upper stages on any use of the wasm source.

We want the runtime to be able to note that the input source (the compressed and pre-transformed code) has not changed, and that the inputs into the various stages in the pipeline have not changed, and then (without re-computing the input into the final stage) be able to reason that the compiled code will not changed and to reuse it.

In some ways getting this right is more important than even the binary encoding. Developers can deal with differences in the binary code between implementations using transforms at upper layers, but an inefficient pipeline cache blocks client side decompression and transforms in user defined code.

@lukewagner
Copy link
Member Author

@JSStats (1) I realize that, but the engine can choose to have it be. I do not see us adding arbitrary constants, but rather only static, boolean detection of features (it's in the name; this isn't sysconf). (2) Not if outputs of the upper layers are cached (SW/Cache API or IDB). As discussed elsewhere, the long-term goal is that these upper layers get standardized which would allow the whole pipeline to be a pure function of the over-the-wire bits + fixed context. Before that, though, it's just arbitrary JS execution.

@ghost
Copy link

ghost commented Oct 20, 2015

@lukewagner

(1.1) If features are only defined to be static per-instance then an engine can choose to change them per-instance, and would still want to implement an effective cache. wasm should not be designed to only work efficiently with an engine with per-build static features.

(1.2) There is a need for per-instance static constants input into the upper layers. Very significantly the memory size to allow the upper layers to emit code optimized for a static size. Even if you can answer the performance challenges of eliminating bounds checks without a fixed memory size (and no one here has), as mentioned many times in discussions there are applications that can naturally take advantage of a fixed memory size - and they need a negotiated size to work on a range of instances with different amounts of available vm.

(2) In order for the outputs of upper layers for be cached, the inputs need to be part of the cache key. The runtime should be able to reason across layers in the data flow and not have to do a cache lookup on each stage and compare the output to the key of the next stage - it's often referred to as 'provenance'.

It's news to me that the 'long-term goal is that these upper layers get standardized'. Part of the appeal of supporting user defined upper layers was the flexibility to meet a range of uses. I get the impression that web developers are frustrated by being locked into a fixed stack, and are desperate for lower level access (that can also be used efficiently). You'll be bike-shedding for the next decade over a lost cause spec'ing a stack. Please consider supporting a user defined stack and this might mean some support for managing the data flow (which I think would be a much better investment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants