Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing atomic instructions with unshared memory #144

Closed
tlively opened this issue Oct 17, 2019 · 14 comments
Closed

Allowing atomic instructions with unshared memory #144

tlively opened this issue Oct 17, 2019 · 14 comments

Comments

@tlively
Copy link
Member

tlively commented Oct 17, 2019

It is currently specified that attempting to use atomic access operators on non-shared linear memory is a validation error. This validation requirement causes a lot of friction for toolchain developers and users.

For example, consider a user wishing to distribute a WebAssembly binary for a C++ library. Say that library uses C++11 atomics internally to achieve thread-safety, but is otherwise agnostic to whether it is used in a single threaded environment or a multithreaded environment. On native platforms a single build of the library can be used in both environments, but due to the validation rules for WebAssembly atomics, the library must be built twice for WebAssembly; once with atomics enabled and once without. Beyond causing extra work for developers, having to produce these separate builds is also a large hole in WebAssembly's portability story.

One solution to this would be for all applications to use shared memories whether they are multithreaded or not. But this is not practical because not all engines support threads and not all engines are expected to support threads in the future.

A better solution would be to relax the validation rules to allow atomic operations to be used with non-shared memories. The semantics for atomic loads, stores, rmw operations, and cmpxch operations are straightforward to extend to unshared memories. There are more possibilities that might make sense for wait and notify operations, but I suggest that waiting on an unshared memory always trap and notify on an unshared memory always return 0.

@lars-t-hansen
Copy link

Making the load/store/rmw/cmpxchg ops more lenient seems fairly unproblematic (though see below) and would make simple cases "just work".

wait and notify I'm more skeptical about. Anyone using wait and notify is basically implementing some sort of mutex, and it's not terribly plausible that that would work the way it's intended in unshared memory. So then the argument would have to be that it's possible for the module to discover whether its memory is shared or not and hence whether to use the mutex or not. So do we want a mechanism for that? I can see the case for allowing these ops to verify in unshared memory but probably not the case for allowing them to execute without traps (even notify).

Another concern with the simple operations is that code written for shared memory may use techniques that are not directly related to wait/notify, such as spinlocks, that make little sense in unshared memory. This again suggests a need for a mechanism to discover that shared memory is used.

It is possible that we could allow the simple ops to verify and execute with non-shared memory but that wait/notify won't even verify w/o shared memory and therefore must be used under control of feature detection, which then must know about shared memory.

To deal with the spinlocks problem, perhaps a constant flag in memory (placed there by the linker) would be adequate, until we know how much an issue this really is; or we add an instruction that evaluates to a constant to test for shared memory. After all, we already have memory.size, why not memory.is_shared?

@AndrewScheidecker
Copy link
Contributor

One solution to this would be for all applications to use shared memories whether they are multithreaded or not. But this is not practical because not all engines support threads and not all engines are expected to support threads in the future.

I think this is actually practical. Even if an engine doesn't support threading, it's not so hard to support this proposal in a superficial way.

There's been some discussion (see WebAssembly/wasm-c-api#107) about sharing functions between threads would add a shared flag to functions (and all other extern kinds), and would prohibit shared functions from calling non-shared functions or accessing non-shared memories/tables/globals. This implies that modules compiled for sharing between threads would not be interoperable with non-shared modules.

Allowing the atomic instructions to operate on non-shared memories will only defer the interoperability problems.

@tlively
Copy link
Member Author

tlively commented Oct 18, 2019

wait and notify I'm more skeptical about. Anyone using wait and notify is basically implementing some sort of mutex, and it's not terribly plausible that that would work the way it's intended in unshared memory.

I think mutexes should mostly just work. The single thread cannot fail to acquire a mutex because there are no other threads that could be holding it. Mutexes can easily be implemented to not execute a wait instruction if they are acquired on the first try.

Programs that wait on condition variables while some other thread does work would be more problematic, but presumably such applications have alternative code paths for single-threaded execution if they are meant to be run in single-threaded environments.

So then the argument would have to be that it's possible for the module to discover whether its memory is shared or not and hence whether to use the mutex or not. So do we want a mechanism for that? I can see the case for allowing these ops to verify in unshared memory but probably not the case for allowing them to execute without traps (even notify).

I don't believe this introspection is necessary if mutexes just work. The reason I suggested that notify not trap is that waiters are often unconditionally woken after work is finished while waiting is usually guarded on some condition. It seems unnecessary to require an alternative code path to avoid executing notify instructions when everything would just work correctly by having them do nothing. There may be usage patterns I'm not considering, though.

Another concern with the simple operations is that code written for shared memory may use techniques that are not directly related to wait/notify, such as spinlocks, that make little sense in unshared memory. This again suggests a need for a mechanism to discover that shared memory is used.

Like normal waiting locks, I would expect spinlocks to always be acquired on the first try and therefore not actually spin.

It is possible that we could allow the simple ops to verify and execute with non-shared memory but that wait/notify won't even verify w/o shared memory and therefore must be used under control of feature detection, which then must know about shared memory... perhaps a constant flag in memory (placed there by the linker) would be adequate...

Good point! If I'm missing something and this introspection is necessary, then it can easily be solved by the linker.


Even if an engine doesn't support threading, it's not so hard to support this proposal in a superficial way.

True, but it seems hacky to have an engine that appears to support the threads proposal but doesn't really support actually having multiple threads. I would like to avoid such a muddled situation.

Allowing the atomic instructions to operate on non-shared memories will only defer the interoperability problems.

Perhaps, but the user problem will be solved for the time before the extensions you mention are implemented and standardized. If those extensions were standardized, it would ideally still be possible to have MVP-compatible multithreaded modules even if other capabilities like forking could not be MVP-compatible.

@KronicDeth
Copy link

I think mutexes should mostly just work. The single thread cannot fail to acquire a mutex because there are no other threads that could be holding it. Mutexes can easily be implemented to not execute a wait instruction if they are acquired on the first try.

Something "just works" for how the Rust WASM support in std and crates like parking_lot does this as we use Arc and Mutex when compiling Lumen and it is only using the single-threaded support in Safari, Chrome, and Firefox.

Like normal waiting locks, I would expect spinlocks to always be acquired on the first try and therefore not actually spin.

We use spinlocks for parts of the memory allocator in Lumen and haven't had issue being single-threaded.

@tlively
Copy link
Member Author

tlively commented Oct 18, 2019

@KronicDeth Good point! I am essentially proposing that we move the stripping of atomics from the tools to the engine, and the fact that everything works when we strip atomics in the tools suggests that everything would continue to work if we had the engines do the stripping instead.

@lars-t-hansen
Copy link

@tlively, there's a fair amount of "can be", "presumably", "seems", "may" in your earlier message. I'm inclined to be skeptical myself. We probably need some data / experience reports here from real applications. I see @KronicDeth has some; more would be welcome.

(Unless we come to agreement quickly it would also be good not to hold up the threads MVP over this, but perhaps that's a separate discussion.)

@tlively
Copy link
Member Author

tlively commented Oct 29, 2019

@lars-t-hansen We will experiment with the proposed semantics in V8 and real-world programs and follow up with our results.

@tlively
Copy link
Member Author

tlively commented Nov 5, 2019

I have confirmed that few Emscripten-compiled program would be broken by this change. Emscripten's pthread implementation calls emscripten_futex_wait and emscripten_futex_wake rather than using wait and notify instructions directly. These futex functions are currently implemented in JS and call Atomics.wait and Atomics.notify, respectively, but they could just as well be implemented in C using __builtin_wasm_atomic_wait_i32 and __builtin_wasm_atomic_notify, which would lower to the corresponding wasm instructions. The Emscripten pthread implementation only calls these futex functions and therefore waits or notifies if the lock is contended, so Emscripten programs using pthreads will never break on an unshared memory.

The only programs that could possibly break under the proposed semantics are those that call emscripten_futex_wait (and therefore Atomics.wait) without first checking for contention. Since this is a low-level API, I don't expect many projects to be using it directly, but there are probably some.

However, we could prevent even these applications from breaking on unshared memory with further changes to how Atomics.wait and i32.atomics.wait work. Currently Atomics.wait is specified to throw a TypeError if it is used with an unshared array, whether or not it would actually sleep. If we changed the semantics to only throw the TypeError (or trap) if the array (or memory) is unshared in the case the sleep condition is satisfied, then only programs that actually try to put their single thread to sleep will break, but those programs are already broken. What do folks think about this slightly more invasive change?

@lars-t-hansen
Copy link

If we changed the semantics to only throw the TypeError (or trap) if the array (or memory) is unshared in the case the sleep condition is satisfied, then only programs that actually try to put their single thread to sleep will break, but those programs are already broken.

They are not broken if they are using a wait with a timeout as a simple implementation of sleep(). That's what I'd do...

@tlively
Copy link
Member Author

tlively commented Nov 9, 2019

Right, I should have been more precise. Only programs that would reach the Suspend step in the Atomics.wait semantics that are also running in a context that forbids suspending or using an unshared memory would throw an error, but those programs throw errors today.

I am proposing that we loosen the restrictions on what programs can execute Atomics.wait without throwing but not that we allow any code to suspend that cannot suspend today.

@lars-t-hansen
Copy link

Only programs that would reach the Suspend step in the Atomics.wait semantics that are also running in a context that forbids suspending or using an unshared memory would throw an error, but those programs throw errors today.

They don't, though. Today those programs require shared memory to validate, in which case they would execute just fine. You're proposing to allow them to validate & execute in a context where they would start throwing errors if they get to the Suspend state, and my argument is only that we should expect there to be some programs like that. (If a program is running in unshared memory today then the author would have used a different mechanism for sleeping.)

@tlively
Copy link
Member Author

tlively commented Nov 12, 2019

I presented this proposal at the CG meeting today. There did not appear to be broad consensus on the best way forward, but allowing normal (i.e. non-wait) atomic operations to validate and execute on unshared memories was uncontroversial. The conditional sections proposal was also raised as a way of allowing individual objects to simultaneously contain and not contain wait instructions.

I presented the two solutions raised earlier in this thread, namely 1) trapping when wait instructions are executed in unshared-memory environments by analogy to Atomics.wait in JS, or 2) modifying the Atomics.wait semantics to trap in unshared memory environments only when the thread would be suspended (and therefore deadlocked) by the wait instruction but otherwise allowing the wait instruction to return early without trapping.

There was mixed support for option 2 due to the riskiness of changing wait semantics. It was also mentioned that pursuing option 1 would not preclude moving to option 2 at a later date if the need arose. Also, an optimal solution would probably use conditional sections to avoid executing waits in unshared environments, in which case option 1 would be fine.

Given all that, are there any objections to moving forward with option 1, i.e. trapping when waits are executed in unshared environments? @lars-t-hansen did you have additional concerns about engines needing to do extra work to lower normal atomics?

@lars-t-hansen
Copy link

@tlively, no real concern about the work involved for the plain atomics, these can be lowered to alignment check + nonatomic access I think.

tlively added a commit to tlively/threads that referenced this issue Dec 18, 2019
As discussed in the CG on November 12, 2019 and in WebAssembly#144. Specifically,
all atomic accesses are now allowed to validate and execute normally
on unshared memories and wait operations trap when used with unshared
memories.

The PR updates Overview.md and makes a best-effort
attempt at updating the spec, making no changes when there is
already a TODO for updating spec text. It also adds new TODO comments
to the reference interpreter where changes will have to be made.
tlively added a commit that referenced this issue Jan 30, 2020
As discussed in the CG on November 12, 2019 and in #144. Specifically,
all atomic accesses are now allowed to validate and execute normally
on unshared memories and wait operations trap when used with unshared
memories.

The PR updates Overview.md and makes a best-effort
attempt at updating the spec, making no changes when there is
already a TODO for updating spec text. It also adds new TODO comments
to the reference interpreter where changes will have to be made.
@tlively
Copy link
Member Author

tlively commented May 6, 2020

We have decided on a path forward for this and updated the spec proposal accordingly, so I will close this issue.

@tlively tlively closed this as completed May 6, 2020
tlively added a commit to llvm/llvm-project that referenced this issue May 8, 2020
Summary:
The WebAssembly backend automatically lowers atomic operations and TLS
to nonatomic operations and non-TLS data when either are present and
the atomics or bulk-memory features are not present, respectively. The
resulting object is no longer thread-safe, so the linker has to be
told not to allow it to be linked into a module with shared
memory. This was previously done by disallowing the 'atomics' feature,
which prevented any objct with its atomic operations or TLS removed
from being linked with any object containing atomics or TLS, and
therefore preventing it from being linked into a module with shared
memory since shared memory requires atomics.

However, as of WebAssembly/threads#144, the
validation rules are relaxed to allow atomic operations to validate
with unshared memories, which makes it perfectly safe to link an
object with stripped atomics and TLS with another object that still
contains TLS and atomics as long as the resulting module has an
unshared memory. To allow this kind of link, this patch disallows a
pseudo-feature 'shared-mem' rather than 'atomics' to communicate to
the linker that the object is not thread-safe. This means that the
'atomics' feature is available to accurately reflect whether or not an
object has atomics enabled.

As a drive-by tweak, this change also requires that bulk-memory be
enabled in addition to atomics in order to use shared memory. This is
because initializing shared memories requires bulk-memory operations.

Reviewers: aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79542
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Jul 2, 2020
Summary:
The WebAssembly backend automatically lowers atomic operations and TLS
to nonatomic operations and non-TLS data when either are present and
the atomics or bulk-memory features are not present, respectively. The
resulting object is no longer thread-safe, so the linker has to be
told not to allow it to be linked into a module with shared
memory. This was previously done by disallowing the 'atomics' feature,
which prevented any objct with its atomic operations or TLS removed
from being linked with any object containing atomics or TLS, and
therefore preventing it from being linked into a module with shared
memory since shared memory requires atomics.

However, as of WebAssembly/threads#144, the
validation rules are relaxed to allow atomic operations to validate
with unshared memories, which makes it perfectly safe to link an
object with stripped atomics and TLS with another object that still
contains TLS and atomics as long as the resulting module has an
unshared memory. To allow this kind of link, this patch disallows a
pseudo-feature 'shared-mem' rather than 'atomics' to communicate to
the linker that the object is not thread-safe. This means that the
'atomics' feature is available to accurately reflect whether or not an
object has atomics enabled.

As a drive-by tweak, this change also requires that bulk-memory be
enabled in addition to atomics in order to use shared memory. This is
because initializing shared memories requires bulk-memory operations.

Reviewers: aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79542
tlively added a commit to llvm/llvm-project that referenced this issue Sep 25, 2020
 WebAssembly/threads#144 updated the
WebAssembly threads proposal to make atomic operations on unshared memories
valid. This change updates the feature checking in the linker accordingly.
Production WebAssembly engines have recently been updated to allow this
behvaior, but after this change users who accidentally use atomics with unshared
memories on older versions of the engines will get validation errors at runtime
rather than link errors.

Differential Revision: https://reviews.llvm.org/D79530
arichardson pushed a commit to arichardson/llvm-project that referenced this issue Mar 24, 2021
 WebAssembly/threads#144 updated the
WebAssembly threads proposal to make atomic operations on unshared memories
valid. This change updates the feature checking in the linker accordingly.
Production WebAssembly engines have recently been updated to allow this
behvaior, but after this change users who accidentally use atomics with unshared
memories on older versions of the engines will get validation errors at runtime
rather than link errors.

Differential Revision: https://reviews.llvm.org/D79530
sbc100 pushed a commit to emscripten-core/emscripten that referenced this issue Mar 29, 2022
Expects the runtime will support atomic access, even in single-threaded
mode. See WebAssembly/threads#144 for more.
tlively added a commit that referenced this issue Sep 12, 2022
Since #144, we've specified that `memory.atomic.notify` is allowed to be executed on unshared memories, but always returns 0 since it is impossible for there to be anything waiting on an unshared memory since all variations of the wait instructions in Wasm and JS trap or throw exceptions on unshared memories. However, I realized recently that there is no technical reason why [`Atomics.waitAsync`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/waitAsync) shouldn't be able to wait on an unshared memory. Because `Atomics.waitAsync` returns a promise rather than blocking, it can in principle be used to synchronize between multiple asynchronous tasks in flight concurrently in the same JS context, and now that [JSPI](https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md) is becoming more real, this actually might be a useful thing to do. In particular `Atomics.waitAsync` could be used to build suspending versions of mutexes and condition variables to synchronize between threads running as different asynchronous tasks within a single JS context.

Actually realizing this vision of asynchronous threading with JSPI on top of `Atomics.waitAsync` requires changes on the JS side to allow `Atomics.waitAsync` and `Atomics.notify` to be used on unshared ArrayBuffers, but in the meantime we can prepare for that change on the Wasm side by softening the language stating that `memory.atomic.notify` unconditionally returns 0 for unshared memories.
conrad-watt pushed a commit that referenced this issue Sep 13, 2022
Since #144, we've specified that `memory.atomic.notify` is allowed to be executed on unshared memories, but always returns 0 since it is impossible for there to be anything waiting on an unshared memory since all variations of the wait instructions in Wasm and JS trap or throw exceptions on unshared memories. However, I realized recently that there is no technical reason why [`Atomics.waitAsync`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics/waitAsync) shouldn't be able to wait on an unshared memory. Because `Atomics.waitAsync` returns a promise rather than blocking, it can in principle be used to synchronize between multiple asynchronous tasks in flight concurrently in the same JS context, and now that [JSPI](https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md) is becoming more real, this actually might be a useful thing to do. In particular `Atomics.waitAsync` could be used to build suspending versions of mutexes and condition variables to synchronize between threads running as different asynchronous tasks within a single JS context.

Actually realizing this vision of asynchronous threading with JSPI on top of `Atomics.waitAsync` requires changes on the JS side to allow `Atomics.waitAsync` and `Atomics.notify` to be used on unshared ArrayBuffers, but in the meantime we can prepare for that change on the Wasm side by softening the language stating that `memory.atomic.notify` unconditionally returns 0 for unshared memories.
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Summary:
The WebAssembly backend automatically lowers atomic operations and TLS
to nonatomic operations and non-TLS data when either are present and
the atomics or bulk-memory features are not present, respectively. The
resulting object is no longer thread-safe, so the linker has to be
told not to allow it to be linked into a module with shared
memory. This was previously done by disallowing the 'atomics' feature,
which prevented any objct with its atomic operations or TLS removed
from being linked with any object containing atomics or TLS, and
therefore preventing it from being linked into a module with shared
memory since shared memory requires atomics.

However, as of WebAssembly/threads#144, the
validation rules are relaxed to allow atomic operations to validate
with unshared memories, which makes it perfectly safe to link an
object with stripped atomics and TLS with another object that still
contains TLS and atomics as long as the resulting module has an
unshared memory. To allow this kind of link, this patch disallows a
pseudo-feature 'shared-mem' rather than 'atomics' to communicate to
the linker that the object is not thread-safe. This means that the
'atomics' feature is available to accurately reflect whether or not an
object has atomics enabled.

As a drive-by tweak, this change also requires that bulk-memory be
enabled in addition to atomics in order to use shared memory. This is
because initializing shared memories requires bulk-memory operations.

Reviewers: aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79542
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
 WebAssembly/threads#144 updated the
WebAssembly threads proposal to make atomic operations on unshared memories
valid. This change updates the feature checking in the linker accordingly.
Production WebAssembly engines have recently been updated to allow this
behvaior, but after this change users who accidentally use atomics with unshared
memories on older versions of the engines will get validation errors at runtime
rather than link errors.

Differential Revision: https://reviews.llvm.org/D79530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants