Skip to content
This repository has been archived by the owner on May 11, 2020. It is now read-only.

Implement emscripten libc environment #163

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions cmd/wasm-run/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ import (
"log"
"os"

"github.com/go-interpreter/wagon/internal/emlibc"

"github.com/go-interpreter/wagon/exec"
"github.com/go-interpreter/wagon/validate"
"github.com/go-interpreter/wagon/wasm"
Expand Down Expand Up @@ -95,6 +97,9 @@ func run(w io.Writer, fname string, verify bool) {
}

func importer(name string) (*wasm.Module, error) {
if name == "env" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 'env' import reserved?

Could someone legitimately create a wasm file named 'env' and we break them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not found it in the WASM spec...it looks to me like its just a 'convention' used internally by emscripten, and possible adopted by LLVM in their code (I'll research more).

That is why it has to be optional

The above was a temporary for the POC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha :) plz move it to a flag or something before we merge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I'm sorry to be clear this pull request was meant to start the conversation not be syntactically correct...I'll work on a better implementation now that we have nailed down a few things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh, my bad, I didnt realise :O

In that case LGTM.

return emlibc.ResolveEnv(name)
}
f, err := os.Open(name + ".wasm")
if err != nil {
return nil, err
Expand Down
114 changes: 114 additions & 0 deletions internal/emlibc/resolver.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
package emlibc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be internal? I imagine some embedded users would want to call GetEnv().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was trying to decide how to handle that so i put it in there as more of an afterthought.

I may be overthinking it but...

should we be laying the ground work for supporting multiple 'environments' (EM, WASI, the other 15 competing specifications that are sure to come)

Should they be built in like i've started, or should they themselves be external wasm files that use a common libc like api that we develop internally.

So at the moment I'm implementing it as a ResolveFunc.

I was thinking about making the ReadModule signature variadic

ReadModule(r io.Reader, resolvePath ...ResolveFunc) (*Module, error) 

but i'm not sure if thats the correct way to go about it so that someone could call

wasm.ReadModule(buf,EMLibc,WASI,FileImporter,WAPM)

and we go through each in turn to try and resolve

some of that can be figured out later...but I was questioning if I was hooking in at the right place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm, good point ... We should have a think about this.

I think the 'resolver' method is probably the right way to go. I'm not a fan of inventing our own internal API, as thats another abstraction layer to maintain and might affect performance depending on implementation.


import (
"errors"
"fmt"
"reflect"

"github.com/go-interpreter/wagon/exec"

"github.com/go-interpreter/wagon/wasm"
)

func ResolveEnv(name string) (*wasm.Module, error) {
if name == "env" {
return GetEnv(), nil
}
fmt.Println("tried resolve", name)
return nil, errors.New("Not Found")
}
func clen(n []byte) int {
for i := 0; i < len(n); i++ {
if n[i] == 0 {
return i
}
}
return len(n)
}
func GetEnv() *wasm.Module {

m := wasm.NewModule()
print := func(proc *exec.Process, v int32) int32 {
fmt.Printf("result = %v\n", v)
return 0
}
puts := func(proc *exec.Process, v int32) int32 {

buf := []byte{}
temp := make([]byte, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try var temp [1]byte and reference it like temp[:] ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i like that...I couldn't decide if it would be better if we had more access to the underlaying []byte, I guess at some point during this process maybe we need to implement a couple other methods on exec.Process for memory management, that may be where we can implement things like alloc/free etc. and could also have a method that returns a reader so we could use bufio.Readers for some of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that may be where we can implement things like alloc/free etc

Hmmmm. How do things like rust/Go handle this? Does everyone who compiles wasm ship their own malloc/free implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats been one of the most confusing things about this as I have been learning WASM. From what I can gather it currently ships inside the glue code from compiling. So its actually provided by the compiler 'runtime'...like emscripten or LLVM.
Here is some of the code out of the emscripten js file that provides the environment to run the wasm

var _free = Module["_free"] = function() {
  return Module["asm"]["_free"].apply(null, arguments)
};

var _main = Module["_main"] = function() {
  return Module["asm"]["_main"].apply(null, arguments)
};

var _malloc = Module["_malloc"] = function() {
  return Module["asm"]["_malloc"].apply(null, arguments)
};

var _memcpy = Module["_memcpy"] = function() {
  return Module["asm"]["_memcpy"].apply(null, arguments)
};

var _memset = Module["_memset"] = function() {
  return Module["asm"]["_memset"].apply(null, arguments)
};

and then its called from the wasm

    call $_printf
    i32.const 4
    call $_malloc
    local.set 4
    local.get 1
    local.get 4
    i32.store
    local.get 1

Really seems like it should have been part of the MVP spec to me...that seems pretty basic, but it appears to me that is how its done...

But its an area of WASM i'm still trying to learn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've learned the most about wasm from playing around with https://github.com/intel/wasm-micro-runtime
They support building with emscripten and clang. Its a pretty clean implementation but targeted more at resource constrained environments.

But to support emscripten (and llvm) they have a libc wrapper and it is the most concise place i've found for figuring out what calls are needed to support the compiler runtimes.

In the following code you can see their "env" implementation.

https://github.com/intel/wasm-micro-runtime/blob/307b54cc5946a5d07ee17bc177c1a2f17e231836/core/iwasm/lib/native/libc/libc_wrapper.c#L933-L969

It covers the basics for libc calls from emscripten or llvm...and from most of the code i've tried against it works without issue. You can clearly see malloc and free operating inside the memory buffer. So thats what i'm basing my info off of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWICT, wasm is intended just to implement a very minimal 'CPU', and design decisions like memory allocation are to be handled by the calling code. The only memory management features in the wasm specification are these two opcodes:

  • memory.grow - increase the size of the memory buffer.
  • memory.current - return the current size of the memory buffer.

In all the wasm I've seen, malloc/free are all implemented in wasm shipped by the application.

Do any other wasm interpreters provide a 'libc' layer like this, or is the libc layer always shipped with the application code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you say from the spec

yes wasmer appears to provide a 'emscripten compatibility layer' and wasm-micro-runtime does as well.

I think it is kind of a grey area at the moment but the problem I see it that since the linear memory buffer is used from both the host and the wasm module then someone has to be authoritative

i.e. if I call into a wasm module from the host and want to pass in a string, I allocate it in the buffer and send a pointer.

If they reply with another string they basically do the same.

So do we assume the linear memory is stateless? So in each iteration the current 'owner' has full use of the buffer...if not someone has to manage the memory. I think that since GC is planned in the post MVP then I think the responsibility for memory management is best handled in the runtime host.

Again these are my very very unqualified opinions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to point out, if you build for a browser then the libc code is in the .js glue file generated by emscripten, so the browser does not directly provide it.

but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.

that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally found the wasmer code...I remembered seeing it but it took me a minute

https://github.com/wasmerio/wasmer/blob/56c571465ea65992172897da44ad2970e0ec55a0/lib/emscripten/src/lib.rs#L555

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the case of non browser runtimes it appears to me they have provided their own glue code natively to support the emscripten compiler.

that is why i mentioned doing the implementations as a library of .wasm files that could be imported rather than writing them natively in go...but we would need to provide at least a minimal api to do that

Lets do that - (as in, lets do what wasmer and the other native runtimes are doing, and provide an identical API).

for i := int(v); i < proc.MemSize(); i++ {
_, err := proc.ReadAt(temp, int64(i))
if err != nil {
fmt.Println(err)
}
if temp[0] == 0 {
break
}
buf = append(buf, temp[0])
}
fmt.Println(string(buf))
return 0
}
m.Types = &wasm.SectionTypes{
Entries: []wasm.FunctionSig{
{
Form: 0, // value for the 'func' type constructor
ParamTypes: []wasm.ValueType{wasm.ValueTypeI32},
ReturnTypes: []wasm.ValueType{wasm.ValueTypeI32},
},
},
}
m.GlobalIndexSpace = []wasm.GlobalEntry{
{
Type: wasm.GlobalVar{
Type: wasm.ValueTypeI32,
},
Init: []byte{65, 0, 11},
},
}
// m.LinearMemoryIndexSpace = [][]byte{make([]byte, 256)}
m.Memory = &wasm.SectionMemories{
Entries: []wasm.Memory{
{
Limits: wasm.ResizableLimits{Initial: 1},
},
},
}
m.FunctionIndexSpace = []wasm.Function{
{
Sig: &m.Types.Entries[0],
Host: reflect.ValueOf(print),
Body: &wasm.FunctionBody{}, // create a dummy wasm body (the actual value will be taken from Host.)
},
{
Sig: &m.Types.Entries[0],
Host: reflect.ValueOf(puts),
Body: &wasm.FunctionBody{}, // create a dummy wasm body (the actual value will be taken from Host.)
},
}
m.Export = &wasm.SectionExports{
Entries: map[string]wasm.ExportEntry{
"print": {
FieldStr: "print",
Kind: wasm.ExternalFunction,
Index: 0,
},
"_puts": {
FieldStr: "_puts",
Kind: wasm.ExternalFunction,
Index: 1,
},
"__memory_base": {
FieldStr: "__memory_base",
Kind: wasm.ExternalGlobal,
Index: 0,
},
"memory": {
FieldStr: "memory",
Kind: wasm.ExternalMemory,
Index: 0,
},
},
}
return m
}
3 changes: 3 additions & 0 deletions internal/emlibc/test/puts.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
//go:generate emcc -Os src/puts.c -s SIDE_MODULE=1 -o puts.wasm -s TOTAL_MEMORY=65536 -s TOTAL_STACK=4096

package test
Binary file added internal/emlibc/test/puts.wasm
Binary file not shown.
1 change: 1 addition & 0 deletions internal/emlibc/test/puts.wasm.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions internal/emlibc/test/puts.wast
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
(module
(type $FUNCSIG$ii (func (param i32) (result i32)))
(data (global.get $__memory_base) "Hello")
(import "env" "__memory_base" (global $__memory_base i32))
(import "env" "_puts" (func $_puts (param i32) (result i32)))
(memory $memory 1)
(global $STACKTOP (mut i32) (i32.const 0))
(global $STACK_MAX (mut i32) (i32.const 0))
(export "__post_instantiate" (func $__post_instantiate))
(export "_main" (func $_main))
(func $_main (; 1 ;) (; has Stack IR ;) (result i32)
;;@ src/puts.c:5:0
(drop
(call $_puts
(global.get $__memory_base)
)
)
;;@ src/puts.c:6:0
(i32.const 0)
)
(func $__post_instantiate (; 2 ;) (; has Stack IR ;)
(global.set $STACKTOP
(i32.add
(global.get $__memory_base)
(i32.const 16)
)
)
(global.set $STACK_MAX
(i32.add
(global.get $STACKTOP)
(i32.const 5242880)
)
)
)
)
6 changes: 6 additions & 0 deletions internal/emlibc/test/src/puts.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#include <stdio.h>


int main(){
puts("Hello");
}
4 changes: 2 additions & 2 deletions wasm/init_expr.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,8 @@ func (m *Module) ExecInitExpr(expr []byte) (interface{}, error) {
if globalVar == nil {
return nil, InvalidGlobalIndexError(index)
}
lastVal = globalVar.Type.Type
return m.ExecInitExpr(globalVar.Init)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this give us an infinite loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could...it happened to me while playing with it...could try and detect it. Do we need this here...I couldn't get it to work before, and if you just return nil,nil without an error it panics in Module.populateLinearMemory() if we really do want to return nil (as it did before) we can either specify an error to return if the stack is empty or deal with a possible null in the error building

The problem comes from reflect.TypeOf(val).Kind() when val is nil

https://github.com/sampaioletti/wagon/blob/1e71fcd3777b154c901701743ae3930e472e86fc/wasm/index.go#L186

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could modify ExecInitExpr (or make a new unexported one) to take an argument, which represents the current context of execution. When we recurse into evaluating the global init expression, we can call it with the global as an argument. That way, we can check before we enter an infinite loop by seeing if we are part of the same function.

But this is a NP problem, we could probably not address this.

// lastVal = globalVar.Type.Type
case end:
break
default:
Expand All @@ -155,7 +156,6 @@ func (m *Module) ExecInitExpr(expr []byte) (interface{}, error) {
if len(stack) == 0 {
return nil, nil
}

v := stack[len(stack)-1]
switch lastVal {
case ValueTypeI32:
Expand Down