Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm ABI special cases scalar pairs (against tool conventions) and is not documented #129486

Open
Manishearth opened this issue Aug 23, 2024 · 11 comments
Labels
A-abi Area: Concerning the application binary interface (ABI). A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

This is a bit of a hybrid bug report and documentation request. I suspect the "bug" will require a breaking change to fix so may be a non-starter.

In Diplomat we've been writing bindings to Rust from JS/Wasm. One thing we support is passing structs over FFI, by-value.

According to the tool conventions, multi-scalar-field structs are passed "indirectly", and single-scalar structs and scalars are passed directly by-value.

This is not what Rust does. This has previously come up in #81386. What Rust does is that it always passes structs by-value, which on the JS side means that the struct is "splatted" across the arguments including padding.

For example, the following type and function

#[repr(C)]
pub struct Big {
    a: u8,
    // 1 byte padding
    b: u16,
    // 4 bytes padding
    c: u64,
}
#[no_mangle]
pub extern "C" fn big(x: Big) {}

gets passed over LLVM IR as

%Big = type { i8, [1 x i8], i16, [2 x i16], i64 }
define dso_local void @big(%Big %0) unnamed_addr #0 {...}

In JS, big needs to be called as wasm.big(a, 0, b, 0, 0, c), with 0s for the padding fields. Note that the padding fields can be different sizes, which is usually irrelevant but important here since "two i16s" and "one i32" end up meaning a different number of parameters. As far as I can tell the padding field has the same size as the alignment of the previous "real" field, but I can't find any documentation on this or even know whether this is a choice on LLVM or Rust's side.

It gets worse, however. Rust seems to special case scalar pairs:

#[repr(C)]
pub struct Small {
  a: u8,
  // 3 bytes padding
  b: u32
}
#[no_mangle]
pub extern "C" fn small(x: Small) {}
define dso_local void @small(i8 %x.0, i32 %x.1) unnamed_addr #0 {..}

Here, despite Small having padding, small() gets called as wasm.small(a, b), because the fields got splatted out in the LLVM IR itself, without padding.

This is even stranger when comparing with the tool conventions because they have no mention of scalar pairs.

It would be really nice if Rust followed the documented tool conventions. I suspect that's not going to happen, and, besides, direct parameter passing is likely more efficient here1.

Failing that, it seems like it would be nice if "pairs" did not have special behavior compared to structs with more than 2 scalars in them. Ideally I'd like the scalar pair behavior to apply to all structs: always splat out structs into fields, never require padding be passed over FFI. I'm not sure if such a breaking change is possible, though.

Failing that, I think this behavior should be carefully documented. I've been discovering this by trial-and-error, and Rust's behavior contradicts the extant documentation, which makes it even more crucial to have documentation on how Rust diverges.

As far as I can tell, this is not a problem for wasm-bindgen since wasm-bindgen doesn't pass structs over FFI by-value, though the failing test mentioned in #81386 seems to indicate they care some amount.

Footnotes

  1. Indirect passing means that the JS side basically has to allocate a heap object each time it wishes to call such a function.

@Manishearth Manishearth added A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools O-wasm Target: WASM (WebAssembly), http://webassembly.org/ C-bug Category: This is a bug. A-abi Area: Concerning the application binary interface (ABI). labels Aug 23, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 23, 2024
@Manishearth
Copy link
Member Author

cc @RalfJung @bjorn3 @alexcrichton

@workingjubilee
Copy link
Member

Isn't this exactly what #117919 was about

@Manishearth
Copy link
Member Author

Ah, thanks! Yes, that fixes the "Rust doesn't match tool conventions" problem.

I'd still love to see documentation of the legacy ABI.

@Manishearth
Copy link
Member Author

In particular two things stand out as super strange and requiring documentation:

  • The padding shows up in the fn args, and the type of the padding field used (not just its size) has a major impact on things
  • Scalar pairs are treated as two scalars rather than one aggregate\

(I am not yet sure if that is indeed the actual behavior, fwiw, this is just what it appears to be to me from poking at it. I have not attempted to disentangle the code.)

@alexcrichton
Copy link
Member

alexcrichton commented Aug 23, 2024

To confirm, this issue is about the wasm32-unknown-unknown target, right? If so then the goal is to one day stabilize -Zwasm-c-abi as "this is just how the target works now". The differences here you're seeing are a result of:

  • When wasm32-unknown-unknown was added the tool-conventions repo didn't exist yet.
  • When the target was added I wrote ABI code and I had no idea what I was doing.
  • In wasm-bindgen it relied on the fact that struct Foo { a: usize, b: usize } was passed as two scalars.

In the end wasm-bindgen got big enough that there was motivation to not break it. For a long time there wasn't a lot of spare effort/motivation to fix the compiler. Since then through heroic efforts of those involved wasm-bindgen is now "fixed" and ready to work with -Zwasm-c-abi. All that's left now is the final push over the hump to get it stable.

Put another way I would personally consider (a) breaking changes are on the table here and should be pushed through and (b) I wouldn't bother documenting the "legacy ABI" because the documentation is "whatever the consequence of whatever alex wrote years ago" which is not really satisfying or understandable documentation.

@RalfJung
Copy link
Member

RalfJung commented Aug 24, 2024

In particular two things stand out as super strange and requiring documentation:

This is just a consequence of #115666, and therefore unstable codegen backend implementation details leaking through. (The ABI adjustment code is terrible, I can't blame anyone for getting it wrong. It really needs a complete refactor... I think I improved things slightly by making the defaults less bad, but a lot of work is left to be done; Cc #119183) We do not intend to make any stability guarantees for those codegen backend implementation details.

IMO this is a duplicate of #122532 and/or #115666. Is anyone working towards getting that ready to ship?

EDIT:

Since then through heroic efforts of those involved wasm-bindgen is now "fixed" and ready to work with -Zwasm-c-abi.

Looks like yes, people are working on that. ❤️

@workingjubilee
Copy link
Member

workingjubilee commented Aug 24, 2024

The biggest thing that would help is if more uses of wasm-bindgen in the wild were 0.2.89 or higher (0.2.88, the version with the fix, got yanked, but the later versions still carry the fix). Based on the crates.io downloads, it only has about ~75% penetration: https://crates.io/crates/wasm-bindgen

Maybe that's enough?

@Manishearth
Copy link
Member Author

I wouldn't bother documenting the "legacy ABI" because the documentation is "whatever the consequence of whatever alex wrote years ago" which is not really satisfying or understandable documentation.

Given that allocating across FFI in JS is actually kinda costly, I actually do think that the "legacy" ABI has one bit of design I'd love to still have access to: passing around aggregates as "piles of scalars". The problem is that it currently does this inconsistently in a way that makes padding matter, and solving that problem brings up questions for what unions are expected to do. But this is solvable.

That said, I don't know if there's much appetite for polishing up the legacy backend.

IMO this is a duplicate of #122532 and/or #115666. Is anyone working towards getting that ready to ship?

I think the current behavior should still be documented somewhere.

@alexcrichton
Copy link
Member

If you're curious there's a long discussion at WebAssembly/tool-conventions#88 about changing the C ABI itself. It's unfortunately not going to be easy.

Otherwise I've opened #129630 to document the current state of the world (which hopefully that documentation won't live for long)

@Manishearth
Copy link
Member Author

Thank you!

@Manishearth
Copy link
Member Author

I've also written some docs for the legacy ABI based on my investigations in rust-diplomat/diplomat#663

workingjubilee pushed a commit to workingjubilee/rustc that referenced this issue Aug 31, 2024
Inspired by discussion on
rust-lang#129486 this is intended to at
least document the current state of the world in a more public location
than throughout a series of issues.
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Aug 31, 2024
…-on-wasm32-u-u, r=workingjubilee

Document the broken C ABI of `wasm32-unknown-unknown`

Inspired by discussion on
rust-lang#129486 this is intended to at least document the current state of the world in a more public location than throughout a series of issues.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 31, 2024
…-on-wasm32-u-u, r=workingjubilee

Document the broken C ABI of `wasm32-unknown-unknown`

Inspired by discussion on
rust-lang#129486 this is intended to at least document the current state of the world in a more public location than throughout a series of issues.
@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 1, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 3, 2024
…-on-wasm32-u-u, r=workingjubilee

Document the broken C ABI of `wasm32-unknown-unknown`

Inspired by discussion on
rust-lang#129486 this is intended to at least document the current state of the world in a more public location than throughout a series of issues.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 3, 2024
…-on-wasm32-u-u, r=workingjubilee

Document the broken C ABI of `wasm32-unknown-unknown`

Inspired by discussion on
rust-lang#129486 this is intended to at least document the current state of the world in a more public location than throughout a series of issues.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 5, 2024
Rollup merge of rust-lang#129630 - alexcrichton:document-broken-c-abi-on-wasm32-u-u, r=workingjubilee

Document the broken C ABI of `wasm32-unknown-unknown`

Inspired by discussion on
rust-lang#129486 this is intended to at least document the current state of the world in a more public location than throughout a series of issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-abi Area: Concerning the application binary interface (ABI). A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants