Skip to content
This repository has been archived by the owner on May 11, 2020. It is now read-only.

Implement emscripten libc environment #163

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sampaioletti
Copy link

@sampaioletti sampaioletti commented Sep 12, 2019

Been playing with adding in some of the libc functions that compiling with emscripten requires, here is a working POC for discussion. Not pretty but I was trying to see how difficult it would be.

I have a few questions:

Not sure if i'm understanding how the init_expr should work. To get the code to work I had to modify the getGlobal case

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/init_expr.go#L138

I have created a Global for __memory_base

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/internal/emlibc/resolver.go#L61

and it is used from here.

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/internal/emlibc/test/puts.wast#L4

without the modification the inti_expr stack is empty so it returns nil,nil which causes a panic

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/init_expr.go#L157

Also a quick look at my internal/emlibc/resolver.go would be appreciated...just to see if i'm missing any important concepts..I'm relatively new to WASM so I'm having to learn as I go.

Thanks for the input.

@codecov-io
Copy link

codecov-io commented Sep 12, 2019

Codecov Report

Merging #163 into master will decrease coverage by 0.02%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #163      +/-   ##
==========================================
- Coverage   69.54%   69.51%   -0.03%     
==========================================
  Files          43       43              
  Lines        5007     5009       +2     
==========================================
  Hits         3482     3482              
- Misses       1231     1233       +2     
  Partials      294      294
Impacted Files Coverage Δ
wasm/init_expr.go 35.29% <0%> (ø) ⬆️
cmd/wasm-run/main.go 24.24% <0%> (-0.76%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e64ad3...1e71fcd. Read the comment docs.

@@ -0,0 +1,114 @@
package emlibc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be internal? I imagine some embedded users would want to call GetEnv().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was trying to decide how to handle that so i put it in there as more of an afterthought.

I may be overthinking it but...

should we be laying the ground work for supporting multiple 'environments' (EM, WASI, the other 15 competing specifications that are sure to come)

Should they be built in like i've started, or should they themselves be external wasm files that use a common libc like api that we develop internally.

So at the moment I'm implementing it as a ResolveFunc.

I was thinking about making the ReadModule signature variadic

ReadModule(r io.Reader, resolvePath ...ResolveFunc) (*Module, error) 

but i'm not sure if thats the correct way to go about it so that someone could call

wasm.ReadModule(buf,EMLibc,WASI,FileImporter,WAPM)

and we go through each in turn to try and resolve

some of that can be figured out later...but I was questioning if I was hooking in at the right place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm, good point ... We should have a think about this.

I think the 'resolver' method is probably the right way to go. I'm not a fan of inventing our own internal API, as thats another abstraction layer to maintain and might affect performance depending on implementation.

puts := func(proc *exec.Process, v int32) int32 {

buf := []byte{}
temp := make([]byte, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try var temp [1]byte and reference it like temp[:] ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i like that...I couldn't decide if it would be better if we had more access to the underlaying []byte, I guess at some point during this process maybe we need to implement a couple other methods on exec.Process for memory management, that may be where we can implement things like alloc/free etc. and could also have a method that returns a reader so we could use bufio.Readers for some of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may be where we can implement things like alloc/free etc

Hmmmm. How do things like rust/Go handle this? Does everyone who compiles wasm ship their own malloc/free implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats been one of the most confusing things about this as I have been learning WASM. From what I can gather it currently ships inside the glue code from compiling. So its actually provided by the compiler 'runtime'...like emscripten or LLVM.
Here is some of the code out of the emscripten js file that provides the environment to run the wasm

var _free = Module["_free"] = function() {
  return Module["asm"]["_free"].apply(null, arguments)
};

var _main = Module["_main"] = function() {
  return Module["asm"]["_main"].apply(null, arguments)
};

var _malloc = Module["_malloc"] = function() {
  return Module["asm"]["_malloc"].apply(null, arguments)
};

var _memcpy = Module["_memcpy"] = function() {
  return Module["asm"]["_memcpy"].apply(null, arguments)
};

var _memset = Module["_memset"] = function() {
  return Module["asm"]["_memset"].apply(null, arguments)
};

and then its called from the wasm

    call $_printf
    i32.const 4
    call $_malloc
    local.set 4
    local.get 1
    local.get 4
    i32.store
    local.get 1

Really seems like it should have been part of the MVP spec to me...that seems pretty basic, but it appears to me that is how its done...

But its an area of WASM i'm still trying to learn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've learned the most about wasm from playing around with https://github.com/intel/wasm-micro-runtime
They support building with emscripten and clang. Its a pretty clean implementation but targeted more at resource constrained environments.

But to support emscripten (and llvm) they have a libc wrapper and it is the most concise place i've found for figuring out what calls are needed to support the compiler runtimes.

In the following code you can see their "env" implementation.

https://github.com/intel/wasm-micro-runtime/blob/307b54cc5946a5d07ee17bc177c1a2f17e231836/core/iwasm/lib/native/libc/libc_wrapper.c#L933-L969

It covers the basics for libc calls from emscripten or llvm...and from most of the code i've tried against it works without issue. You can clearly see malloc and free operating inside the memory buffer. So thats what i'm basing my info off of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWICT, wasm is intended just to implement a very minimal 'CPU', and design decisions like memory allocation are to be handled by the calling code. The only memory management features in the wasm specification are these two opcodes:

  • memory.grow - increase the size of the memory buffer.
  • memory.current - return the current size of the memory buffer.

In all the wasm I've seen, malloc/free are all implemented in wasm shipped by the application.

Do any other wasm interpreters provide a 'libc' layer like this, or is the libc layer always shipped with the application code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you say from the spec

yes wasmer appears to provide a 'emscripten compatibility layer' and wasm-micro-runtime does as well.

I think it is kind of a grey area at the moment but the problem I see it that since the linear memory buffer is used from both the host and the wasm module then someone has to be authoritative

i.e. if I call into a wasm module from the host and want to pass in a string, I allocate it in the buffer and send a pointer.

If they reply with another string they basically do the same.

So do we assume the linear memory is stateless? So in each iteration the current 'owner' has full use of the buffer...if not someone has to manage the memory. I think that since GC is planned in the post MVP then I think the responsibility for memory management is best handled in the runtime host.

Again these are my very very unqualified opinions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to point out, if you build for a browser then the libc code is in the .js glue file generated by emscripten, so the browser does not directly provide it.

but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.

that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally found the wasmer code...I remembered seeing it but it took me a minute

https://github.com/wasmerio/wasmer/blob/56c571465ea65992172897da44ad2970e0ec55a0/lib/emscripten/src/lib.rs#L555

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.

that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that

Lets do that - (as in, lets do what wasmer and the other native runtimes are doing, and provide an identical API).

@@ -144,7 +144,8 @@ func (m *Module) ExecInitExpr(expr []byte) (interface{}, error) {
if globalVar == nil {
return nil, InvalidGlobalIndexError(index)
}
lastVal = globalVar.Type.Type
return m.ExecInitExpr(globalVar.Init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this give us an infinite loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could...it happened to me while playing with it...could try and detect it. Do we need this here...I couldn't get it to work before, and if you just return nil,nil without an error it panics in Module.populateLinearMemory() if we really do want to return nil (as it did before) we can either specify an error to return if the stack is empty or deal with a possible null in the error building

The problem comes from reflect.TypeOf(val).Kind() when val is nil

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/index.go#L186

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could modify ExecInitExpr (or make a new unexported one) to take an argument, which represents the current context of execution. When we recurse into evaluating the global init expression, we can call it with the global as an argument. That way, we can check before we enter an infinite loop by seeing if we are part of the same function.

But this is a NP problem, we could probably not address this.

@@ -95,6 +97,9 @@ func run(w io.Writer, fname string, verify bool) {
}

func importer(name string) (*wasm.Module, error) {
if name == "env" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 'env' import reserved?

Could someone legitimately create a wasm file named 'env' and we break them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not found it in the WASM spec...it looks to me like its just a 'convention' used internally by emscripten, and possible adopted by LLVM in their code (I'll research more).

That is why it has to be optional

The above was a temporary for the POC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha :) plz move it to a flag or something before we merge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I'm sorry to be clear this pull request was meant to start the conversation not be syntactically correct...I'll work on a better implementation now that we have nailed down a few things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh, my bad, I didnt realise :O

In that case LGTM.

@sampaioletti sampaioletti mentioned this pull request Sep 20, 2019
@sampaioletti
Copy link
Author

As i've been working through this I decided to try going a different route to learn how the emscripten internals work so I created a different project at github.com/sampaioletti/wagoja it basically uses goja to create a node like environment and ties it back to wagon so that the emscripten generated scripts will work

I was able to get it working over the weekend with a basic example. It was a nightmare (: but it is working for that limited case shown in the example folder and it definitely helped me understand what a emscripten libc implementation will need to look like.

That repo relies on another branch in my sampaioletti/wagon fork called 'wagoja' I made the changes required to make this work and started working on a few of the other things we've been discussing in there (like the module builder).

I'm going to play with wagoja (sorry i hate naming projects) and begin to start replacing it with functionality implemented in wagon.

Feel free to poke around if your interested...mostly hack work..but it took a lot of playing to get it to function correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants