Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm32-wasip1 depends on libc memset with no_std #130621

Open
drebbe-intrepid opened this issue Sep 20, 2024 · 16 comments
Open

wasm32-wasip1 depends on libc memset with no_std #130621

drebbe-intrepid opened this issue Sep 20, 2024 · 16 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. O-wasi Operating system: Wasi, Webassembly System Interface O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@drebbe-intrepid
Copy link

I've looked high and low between the WASM/WASI specifications and can't find what is the "correct" behavior here but the current rust behavior seems wrong to me.

I don't believe we should be trying to import memset from the "env" module:

(func $import0 (import "env" "memset") (param i32 i32 i32) (result i32))

wasm3 engine can't run this code either due to this:

$ wasm3 target/wasm32-wasip1/release/wasm_br_test.wasm 
Error: missing imported function ('env.memset')

Code

#![no_std]
#![no_main]

#[no_mangle]
pub fn _start() {
    let _asdf = [0; 40];
}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

Cargo.toml

[lib]
crate-type = ["cdylib"]

[profile.release]
lto = true
#opt-level = 's'
opt-level = 0
codegen-units = 1
panic = "abort"
strip = true

.cargo/config.toml

[build]
target = "wasm32-wasip1"


[target.wasm32-wasip1]
rustflags = ["-C", "link-arg=-zstack-size=65520",]

Meta

rustc --version --verbose:

rustc 1.83.0-nightly (f79a912d9 2024-09-18)
binary: rustc
commit-hash: f79a912d9edc3ad4db910c0e93672ed5c65133fa
commit-date: 2024-09-18
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0

wasm generated binary

(module
  (type $type0 (func (param i32 i32 i32) (result i32)))
  (type $type1 (func))
  (func $import0 (import "env" "memset") (param i32 i32 i32) (result i32))
  (table $table0 1 1 funcref)
  (memory $memory0 1)
  (global $global0 (mut i32) (i32.const 65520))
  (export "memory" (memory $memory0))
  (export "_start" (func $func1))
  (func $func1
    (local $var0 i32) (local $var1 i32) (local $var2 i32) (local $var3 i32) (local $var4 i32) (local $var5 i32) (local $var6 i32)
    global.get $global0
    local.set $var0
    i32.const 160
    local.set $var1
    local.get $var0
    local.get $var1
    i32.sub
    local.set $var2
    local.get $var2
    global.set $global0
    i32.const 160
    local.set $var3
    i32.const 0
    local.set $var4
    local.get $var2
    local.get $var4
    local.get $var3
    call $import0
    drop
    i32.const 160
    local.set $var5
    local.get $var2
    local.get $var5
    i32.add
    local.set $var6
    local.get $var6
    global.set $global0
    return
  )
)
@drebbe-intrepid drebbe-intrepid added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 20, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 20, 2024
@fmease fmease added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ O-wasi Operating system: Wasi, Webassembly System Interface labels Sep 20, 2024
@alexcrichton
Copy link
Member

Have you tried compiling for the wasm32-unknown-unknown target? For the wasm32-wasip1 target memset comes from wasi-libc which you're disabling here through #![no_std]. For the wasm32-unknown-unknown target it comes through compiler-builtins which is always linked in.

@drebbe-intrepid
Copy link
Author

Is there documentation anywhere on wasi-libc being included as part of rust or the wasi spec?

Here is wasm32-unknown-unknown which looks like it removes the memset call:

(module
  (type $type0 (func))
  (type $type1 (func (param i32 i32 i32) (result i32)))
  (table $table0 1 1 funcref)
  (memory $memory0 16)
  (global $global0 (mut i32) (i32.const 1048576))
  (global $global1 i32 (i32.const 1048576))
  (global $global2 i32 (i32.const 1048576))
  (export "memory" (memory $memory0))
  (export "_start" (func $func0))
  (export "__data_end" (global $global1))
  (export "__heap_base" (global $global2))
  (func $func0
    (local $var0 i32) (local $var1 i32) (local $var2 i32) (local $var3 i32) (local $var4 i32) (local $var5 i32) (local $var6 i32)
    global.get $global0
    local.set $var0
    i32.const 160
    local.set $var1
    local.get $var0
    local.get $var1
    i32.sub
    local.set $var2
    local.get $var2
    global.set $global0
    i32.const 160
    local.set $var3
    i32.const 0
    local.set $var4
    local.get $var2
    local.get $var4
    local.get $var3
    call $func1
    drop
    i32.const 160
    local.set $var5
    local.get $var2
    local.get $var5
    i32.add
    local.set $var6
    local.get $var6
    global.set $global0
    return
  )
  (func $func1 (param $var0 i32) (param $var1 i32) (param $var2 i32) (result i32)
    (local $var3 i32) (local $var4 i32) (local $var5 i32)
    block $label1
      block $label0
        local.get $var2
        i32.const 16
        i32.ge_u
        br_if $label0
        local.get $var0
        local.set $var3
        br $label1
      end $label0
      local.get $var0
      i32.const 0
      local.get $var0
      i32.sub
      i32.const 3
      i32.and
      local.tee $var4
      i32.add
      local.set $var5
      block $label2
        local.get $var4
        i32.eqz
        br_if $label2
        local.get $var0
        local.set $var3
        loop $label3
          local.get $var3
          local.get $var1
          i32.store8
          local.get $var3
          i32.const 1
          i32.add
          local.tee $var3
          local.get $var5
          i32.lt_u
          br_if $label3
        end $label3
      end $label2
      local.get $var5
      local.get $var2
      local.get $var4
      i32.sub
      local.tee $var4
      i32.const -4
      i32.and
      local.tee $var2
      i32.add
      local.set $var3
      block $label4
        local.get $var2
        i32.const 1
        i32.lt_s
        br_if $label4
        local.get $var1
        i32.const 255
        i32.and
        i32.const 16843009
        i32.mul
        local.set $var2
        loop $label5
          local.get $var5
          local.get $var2
          i32.store
          local.get $var5
          i32.const 4
          i32.add
          local.tee $var5
          local.get $var3
          i32.lt_u
          br_if $label5
        end $label5
      end $label4
      local.get $var4
      i32.const 3
      i32.and
      local.set $var2
    end $label1
    block $label6
      local.get $var2
      i32.eqz
      br_if $label6
      local.get $var3
      local.get $var2
      i32.add
      local.set $var5
      loop $label7
        local.get $var3
        local.get $var1
        i32.store8
        local.get $var3
        i32.const 1
        i32.add
        local.tee $var3
        local.get $var5
        i32.lt_u
        br_if $label7
      end $label7
    end $label6
    local.get $var0
  )
)

@alexcrichton
Copy link
Member

Documentation not really, but that's sort of the defining feature of the wasip1 target is that it's using WASI APIs through wasi-libc. In that sense I suspect that the documentation you seek may not exist.

@drebbe-intrepid
Copy link
Author

I would have not expected this behavior at all. Looks like wasm-unknown-unknown is the correct target for me but I didn't even realize wasi-libc was a thing until mentioned here. Its not even listed in the WASI-proposals

@drebbe-intrepid
Copy link
Author

What would be the best place for this documentation?

@drebbe-intrepid drebbe-intrepid changed the title WASM env memset import wasm32-wasip1 depends on libc memset with no_std Sep 20, 2024
@drebbe-intrepid
Copy link
Author

drebbe-intrepid commented Sep 20, 2024

Looks like clang++ does the same thing with similar c++ code.

extern "C" void _start() {
    int a[255] = {0};
}
$ clang++ --target=wasm32 -flto -nostdlib -Wl,--no-entry -Wl,--export-all -o test.wasm test.cpp
wasm-ld: error: lto.tmp: undefined symbol: memset
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

$ clang++ --version
clang version 18.1.8
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@alexcrichton
Copy link
Member

As to where to document this, I'm not sure! The behavior you're describing here matches native platforms as well, for example

$ clang++ -flto -nostdlib  -o test.wasm test.cpp
/usr/bin/ld: /tmp/lto-llvm-c0452f.o: in function `_start':
ld-temp.o:(.text._start+0x1a): undefined reference to `memset'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

It's more common than not that libc provides memset, so I'm not sure where to document that in a way that's specific to WASI. Where would you have tried to look for documentation like this? Maybe that's a good place to send a PR?

@saethlin saethlin added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 20, 2024
@drebbe-intrepid
Copy link
Author

drebbe-intrepid commented Sep 20, 2024

Maybe the WASI specification should state something about it?

I think a few things need to happen here (just thoughts, open to ideas):

  • rustc should error out like clang++ because it has an undefined reference to memset when no_std is used
  • rustc should probably provide some type of implementation for memset outside libc. I believe emscripten does this. There seems to be no standard here so unclear what the correct path forward would be.
  • libc seems to be a defacto standard in majority of compilers, having a clear standard for rust documented would probably be ideal as WASM gains popularity.

I've never had to care about libc and things like memset because it always just worked but I'm probably not going to be the first one to come across this behavior.

@bjorn3
Copy link
Member

bjorn3 commented Sep 21, 2024

rustc should error out like clang++ because it has an undefined reference to memset when no_std is used

Rustc doesn't because we pass --allow-undefined to the linker. See the comment on the code in question why we can't just remove this argument:

// FIXME we probably shouldn't pass this but instead pass an explicit list
// of symbols we'll allow to be undefined. We don't currently have a
// mechanism of knowing, however, which symbols are intended to be imported
// from the environment and which are intended to be imported from other
// objects linked elsewhere. This is a coarse approximation but is sure to
// hide some bugs and frustrate someone at some point, so we should ideally
// work towards a world where we can explicitly list symbols that are
// supposed to be imported and have all other symbols generate errors if
// they remain undefined.
concat!($prefix, "--allow-undefined"),

rustc should probably provide some type of implementation for memset outside libc.

It does on targets where libc is expected to not be used like wasm32-unknown-unknown.

I believe emscripten does this.

No, emscripten has it's own libc that provides memset.

libc seems to be a defacto standard in majority of compilers, having a clear standard for rust documented would probably be ideal as WASM gains popularity.

Only for languages that natively interface with C. There are also AssemblyScript, the wasm port of C#, TeaVM (compiling Java to wasm), Hoot (compiling a Scheme to wasm) and more which do not use any libc.

@bjorn3
Copy link
Member

bjorn3 commented Sep 21, 2024

I've never had to care about libc and things like memset because it always just worked but I'm probably not going to be the first one to come across this behavior.

If you didn't have to care about it, that is almost certainly because you didn't use -nostdlib which is the C equivalent of #![no_std]. Without -nostdlib the linker will automatically add a dependency on libc, just like in rustc not using #![no_std] will automatically add a dependency on libstd and on targets that need it libc.

@drebbe-intrepid
Copy link
Author

@bjorn3 awesome, thank you for the information, this is what I was looking for from the start. I'd like to possibly put this information somewhere but I don't believe rust has any official documentation on wasm stuff (or target specific behavior). I did find this: https://github.com/rustwasm/book

Maybe a PR against this repo would be ideal for now?

@drebbe-intrepid
Copy link
Author

@bjorn3 I've been thinking about this and researching more since our last discussion. This is more than a documentation bug IMO.

This should never be generated code for a WASI target because its not part of the WASI specification.

(func $import0 (import "env" "memset") (param i32 i32 i32) (result i32))

The truth is both rustc and clang generate this import when there is no standard library present. I'm assuming there are probably a handful of corner cases where libc methods are leaking through like this. I'm going to experiment and see if others work around this (wasi-sdk and emscripten).

@bjorn3
Copy link
Member

bjorn3 commented Oct 3, 2024

The wasm32-wasip1 target is meant to be used with wasi-libc, which provides memset. We document that memset and such are required by all rust code: https://doc.rust-lang.org/stable/core/#how-to-use-the-core-library For all targets where no libc is ever used, compiler-builtins provides these symbols, but on those where libc is used, like wasm32-wasip1, this is not possible however as the definitions in compiler-builtins would conflict with libc.

@drebbe-intrepid
Copy link
Author

It looks like this is the case because we can't separate with no_core (#29639). Its not that wasm32-wasip1 target is meant to be used with wasi-libc but that rust itself needs core to build and in turn needs libc.

That link above about the core library is very useful information, thank you. I was under the assumption with my example above core isn't being used since there are no "imports" (using statements).

@bjorn3
Copy link
Member

bjorn3 commented Oct 3, 2024

It's not just libcore that needs memset and co. It is any LLVM or GCC compiled code that doesn't explicitly disable the dependency on compiler-builtins. And in addition rustc itself emits memcpy calls for every copy and move larger than two registers (though compiler backends may optimize this away in case of small copies).

@drebbe-intrepid
Copy link
Author

drebbe-intrepid commented Oct 3, 2024

This might be a specific problem to clang/llvm and wasi-sdk

from: https://doc.rust-lang.org/beta/rustc/platform-support/wasm32-unknown-unknown.html#requirements

wasm32-wasip1 - the wasi-sdk toolchain is used to compile C/C++ on this target and can interop with Rust code. WASI works on the web so far as there's no blocker, but an implementation of WASI APIs must be either chosen or reimplemented.

From what I can see, (I'm still in process of verifying this). wasi-sdk provides a libclang_rt.builtins-wasm32-wasi-24.0.tar.gz in the release assets that might be required when building wasi-sdk in order to pick up "memset"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. O-wasi Operating system: Wasi, Webassembly System Interface O-wasm Target: WASM (WebAssembly), http://webassembly.org/ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants