-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement extensible syscall interface for wasm #47102
Conversation
r? @aidanhs (rust_highfive has picked a reviewer for you, use r? to override) |
I agree with the idea of this PR and from a cursory look it looks great. It really is independent from js. It doesn't even use json for the communication, that's great. The only question remaining is about feature gating. I think following established procedures here is important before this is available on stable and nothing about the design can't be changed any more. Although I've heard people say that the stability promise is only valid for tier1 targets? Dunno. |
In general, as I said before in related issue, I'm very supportive of providing a generic host-agnostic interface for std bindings to WebAssembly, but I'm not sure about using a single syscall entry point - it seems sort of error prone and suboptimal as 1) host might easily miss implementation of some syscalls (especially when Rust side adds support for new ones), in which case they will just silently be no-op and 2) if Rust uses only few syscalls, it's hard to know on the host side which exactly without looking through entire code or the generated wasm, and so it's impossible to eliminate unused code as it's all inside of single function with IMO it would make sense to split bindings into separate functions so that host could implement only those that are used, and when it misses implementations for some of them, WebAssembly constructor will immediately emit an error about missing imports upon loading, so it will be both easy to track what's going on and eliminate anything unused for constrained targets. |
@RReverser one of the goals of this PR is to make it possible to extend without breaking existing uses. All libstd operations are currently implemented as no-ops, so if we add new ones then existing code will continue to function as it used to. If we added a new import each time we made a change, then every user would have to continually update the code hosting the WebAssembly module. A generic host-agnostic interface for std bindings using wasm imports is impossible to reconcile with rust's stability guarantees. |
src/etc/wasm32-shim.js
Outdated
case 2: syscall_exit(viewstruct(data, 1)); break; | ||
case 3: syscall_args(viewstruct(data, 3)); break; | ||
case 4: syscall_getenv(viewstruct(data, 5)); break; | ||
default: console.log("Unsupported syscall: " + index.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to return a value to the Rust program instead, indicating "unsupported syscall". This would allow the program to handle the "unsupported syscall" itself, maybe in a non fatal way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already handle unsupported syscalls in a non-fatal way. For example, if SetEnv
is called, it turns into a no-op and we simply log the fact that it went unhandled. On the rust side there's no way to indicate that an error happened, so either we silently do nothing or we panic. I don't think we get anything from adding a return value?
That's of a concern when you ship the wasm together with glue code, e.g. the browser embedding use case where you have additional control over the js layer. The browser use case can however already be covered by specific targets and the If you have a wasm-first embedding system, e.g. for your cryptocurrency or for a platform independent plugin system for your DAW, you wouldn't ship the syscall implementations as they are already provided by the host program (of course, the host program would need to ship with an implementation for all the syscall implementations). The great advantage of @Diggsey 's approach is that if a syscall is not implemented, it is not neccessarily fatal. This has several upsides:
|
The The syscall interface should not use syscall numbers. It should use multiple imports- they don't need to be stabilized immediately, if ever. If they are, they're much closer to the target specification format than to std's surface area. Using multiple imports also ties into the portability lint, so that we can check at compile time whether the target environment supports a syscall. We should not be relying on dead code elimination and runtime errors for the core functionality here. (The interface also should not look anything like POSIX, though this PR doesn't really attempt to lay out a design for that anyway so we can probably hold off on that discussion for now.) |
I've looked up JS API docs and that while this does seem to be correct, one can easily list all the required imports of a wasm module. So I think we might want to switch to multiple functions after all? I think I'm on the fence on this now. |
@est31 the full list of imports will include user defined imports. You would need to implement a naming scheme to distinguish those imports that should be automatically generated from those which the user is implementing, and the naming scheme would also somehow need to indicate what the arguments and return type should be. It could be quite problematic if you wanted to host wasm in a statically typed language. |
Yeah. If this is thought further you'd arrive close to my suggestion to use the
Other embeddings than the JS embedding will have different APIs. If you only want to return an error, you don't need any arguments or return types, do you? |
Since wasm is strongly typed, it's likely that any wasm bindings to a static language would require that the signatures of your imports match the signatures being imported. It may not even be possible to dynamically generate imports - they may have to be specified at compile time. It's probably possible with enough work, but at that point what are you really gaining? If we do it via imports, then users have to worry about backwards compatibility, and will have to implement this generator and name de-mangler in their host so that when they update rust their programs keep working. It seems to me they shouldn't have to care about that stuff. Another problem with the current situation is that we depend on the dead code elimination step. If the dead code elimination ever changes in any way it potentially breaks all users of the wasm target. Because of this, the guarantees that webassembly gives you WRT to matching up imports don't really mean all that much... |
Just like in this PR you need to implement an encoding scheme where each syscall ID corresponds to specific operation; IMO either has equal complexity, while other one allows names to be more descriptive. |
@RReverser no, in this PR you don't need to do anything other than define one no-op import and you get the same behaviour as you would today. |
Not at all - new imports are required only for new std functionality, so if the code doesn't use anything new from std, it will continue to work exactly as it used to, as that import simply won't be linked in. |
I'm talking about real-world usecase where all syscalls are implemented, not just the stub - that one is easy bit with any approach. |
As I said, that depends on dead code elimination, and that's not guaranteed to happen in a multi-crate scenario. That means adding new imports will break code.
By that definition, the current wasm target is completely useless, so this PR is an improvement either way... Also, the real world usecase is where only some syscalls are implemented. Testing is a real world usecase and it only needs basic IO. |
Why on dead code elimination? It's just regular linkage which pulls only required imports from std, and then, correspondingly, from whatever it uses. If you print LLVM IR of "hello world" in debug mode, you'll see that it has only the symbols defined that are actually required for the entry point of the app & for println and not all of std. |
Agreed, but
is a slippery slope to valuate PRs IMO, as once we introduce some approach, it will be much harder if not impossible to introduce breaking changes to the ecosystem, so it's worth discussing pros/cons of all options on issue or PR before merging anything. |
Linking happens at an object file level, if you pull in an object file, the linker pulls in any symbols required by the object as a whole, even if they would not be used by the main program: dependencies are tracked from object file -> symbol, not from symbol -> symbol. If symbols are removed beyond that, then that's an additional optimisation that we shouldn't be relying on for correctness. There are other ways using imports can cause issues:
|
If we're branching using cfg! (as we should), then, again, without optimisations only one branch will be generated by Rust, as per compile-time config.
I think this and few other concerns were addressed above with the "reading all imports" approach? (which, as you noted, will require a naming scheme, but that's not a big problem given that for current PR you also need a unique name, might as well use it as a prefix for namespace) |
|
Not a linker, but construct-time (when you still have a chance to check |
OK, I'm in the process of updating the PR - should I be adding the I looked into adding |
I'd be ok with removing it entirely for now and adding it back on an as-needed basis. |
Alright, PR updated. |
@bors: r+ |
📌 Commit 0e6601f has been approved by |
⌛ Testing commit 0e6601f with merge ddc3b6814c52b2bf912ba53cba66d5b4a06b81d8... |
💔 Test failed - status-appveyor |
⌛ Testing commit 0e6601f with merge d8a8710326bd379d0fff5b698b0dfb0140dd9d91... |
💔 Test failed - status-appveyor |
One of the appveyor targets seems to be timing out at 3 hours. I don't think this is anything I've done? |
Implement extensible syscall interface for wasm Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O. Any solution is constrained by these factors: - It must not be javascript specific - There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target) - WebAssembly does not allow "optional" imports - all imports *must* be resolved. - WebAssembly does not support calling the host environment through any channel *other* than imports. The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid: ```javascript imports.env = { rust_wasm_syscall: function(index, data) {} } ``` Currently I have implemented these system calls: - Read from stdin - Write to stdout/stderr - Set the exit code - Get command line arguments - Get environment variable - Set environment variable - Get time It need not be extended beyond this set if being able to run tests for this target is the only goal. edit: As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance. To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.
💔 Test failed - status-appveyor |
(cc @alexcrichton you may want to merge this manually; current testing PR is a beta backport, so no need to retry anything.) |
Implement extensible syscall interface for wasm Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O. Any solution is constrained by these factors: - It must not be javascript specific - There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target) - WebAssembly does not allow "optional" imports - all imports *must* be resolved. - WebAssembly does not support calling the host environment through any channel *other* than imports. The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid: ```javascript imports.env = { rust_wasm_syscall: function(index, data) {} } ``` Currently I have implemented these system calls: - Read from stdin - Write to stdout/stderr - Set the exit code - Get command line arguments - Get environment variable - Set environment variable - Get time It need not be extended beyond this set if being able to run tests for this target is the only goal. edit: As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance. To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.
☀️ Test successful - status-appveyor, status-travis |
Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O.
Any solution is constrained by these factors:
The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid:
Currently I have implemented these system calls:
It need not be extended beyond this set if being able to run tests for this target is the only goal.
edit:
As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance.
To solve this,
I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (I disabled this behaviour when building tests.#![wasm_auto_run]
)