Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when running a spin image in docker/containerd on RISCV64 #128

Open
matsbror opened this issue May 28, 2024 · 19 comments
Open

Issue when running a spin image in docker/containerd on RISCV64 #128

matsbror opened this issue May 28, 2024 · 19 comments

Comments

@matsbror
Copy link

matsbror commented May 28, 2024

I am trying to get containerd (through docker) to run wasm-files on my VisionFive 2 board which has a RISCV64 processor.
Here are the platform details:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble
$ uname -a
Linux ubuntu 6.8.0-31-generic #31.1-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr 21 01:12:53 UTC 2024 riscv64 riscv64 riscv64 GNU/Linux

I am trying to run the following example: https://github.com/nigelpoulton/dockercon2023-wasm-lab, which is a simple spin application.

I have verified that docker works as I can run natively build docker images.

I have tried the following command, with bad result:

$ docker run  --runtime=io.containerd.spin.v2   --platform=wasm   -p 3000:80   matsbror/docker-wasm:spin-0.1 /
thread '<unnamed>' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmtime-18.0.4/src/runtime/code_memory.rs:254:18:
Failed cache clear: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

That exact command worked well on both an ARM64 as well as an x86_64 machine.

Docker version 26.1.3, containerd version v2.0.0-rc

I tried running it directly with containerd like this:

$ sudo ctr run --runtime=io.containerd.spin.v2 --platform wasm docker.io/matsbror/docker-wasm:spin-0.1 x3

This command exited without any error, but the code did not run. It, however left a container that I can see with sudo ctr c ls

The spin shim was built from a customised fork of the repo where I refer to a local variant of spin where I backported a fix of spin from the master HEAD to v2.5.1 which is referred to in Cargo.toml. I also had to remove trigger-sqs and trigger-command in engine.rs as it would not build with them on RISCV64.

My fork and version of the shim
My spin

Please help me get it to run on RISCV64!

I should note that I can run the program in spin without any issue (with the memory pool fix).

EDIT: On x86, I need to set the platform to wasi/wasm while on RISCV64 and ARM64, if I do that I get the following error:

spin-0.1: Pulling from matsbror/docker-wasm
docker: operating system is not supported.
@kate-goldenring
Copy link
Collaborator

@matsbror I haven't run the shim on RISCV64 but can you see if bypassing docker and just using containerd works? From there, we can debug how to use Docker.

Instead of transporting the Spin app in a scratch container, push it to a registry after running a spin build. Replace this step with one that uses spin registry push. The following pushes the app to a free ephemeral registry with a TTL of 48h (based on the tag:

spin registry push ttl.sh/spin-wasm-app:48h

Then try ctr again:

sudo ctr image pull ttl.sh/spin-wasm-app:48h
sudo ctr run --rm --net-host --runtime io.containerd.spin.v2 ttl.sh/spin-wasm-app:48h myapp bogus-arg
# output should look similar to a `spin up`

You should now be able to interact with the app in the same way as after a spin up:

curl localhost:80/yo

@matsbror
Copy link
Author

Thank you @kate-goldenring ! I did not know how to use containerd properly. This indeed works. So the issue should be in docker and how it calls containerd. Since the crash was in /wasmtime-18.0.4/src/runtime/code_memory.rs:254:18 I assumed that the issue was in the shim.

I would appreciate further pointers on how to debug.

@matsbror
Copy link
Author

Here is the log from when starting the following command:

docker run  --runtime=io.containerd.spin.v2   --platform=wasm   -p 3000:80   matsbror/docker-wasm:spin-0.1 /

docker.log

I do not really know what to look for or how to interpret it. Here is for comparison the log from a successful execution of the hello-world container:

docker-hello.log

@kate-goldenring
Copy link
Collaborator

@matsbror I will look into trying to repro this. For starters, I did notice a discrepancy in the platform. The docs seem to indicate it should be --platform=wasi/wasm

@kate-goldenring
Copy link
Collaborator

This indeed works. So the issue should be in docker and how it calls containerd. Since

docker is just calling out to the shim. I did notice that the example in that repository does not properly map the destination path of the spin component in the Dockerfile. Specifically, say the source path in the spin.toml is the following:

[component.yoyo]
source = "target/wasm32-wasi/release/yoyo.wasm"

The Dockerfile should look like this:

FROM scratch
COPY spin.toml /spin.toml
COPY target/wasm32-wasi/release/yoyo.wasm /target/wasm32-wasi/release/yoyo.wasm
ENTRYPOINT ["/spin.toml"]

I pushed an app with that Dockerfile structure here ttl.sh/spin-yoyo-app:48h if you want to try it out with ctr and docker

I have not used docker to execute Spin apps. I am trying to configure Docker desktop to try it out. Is there a reason you want to use docker to execute containerd?

@matsbror
Copy link
Author

matsbror commented May 29, 2024

I failed pulling this image:

$ sudo ctr image pull ttl.sh/spin-yoyo-app:48h
ctr: rpc error: code = NotFound desc = failed to resolve image: ttl.sh/spin-yoyo-app:48h: not found

But I replicated this structure in my own example, and got the same result as before:

thread '<unnamed>' panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmtime-18.0.4/src/runtime/code_memory.rs:254:18:
Failed cache clear: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }

I have not used docker to execute Spin apps. I am trying to configure Docker desktop to try it out. Is there a reason you want to use docker to execute containerd?

No, eventually I will use k3s (most likely) but that also needs porting to RISCV64 so I thought docker should be a step on the way.

@kate-goldenring
Copy link
Collaborator

No, eventually I will use k3s (most likely) but that also needs porting to RISCV64 so I thought docker should be a step on the way.

I'd recommend going ahead and trying K3s, skipping over docker, since K3s uses containerd. That way you can use the Spin OCI format (spin registry push) rather than scratch containers with Docker, which we've already seen work with ctr. I would follow the SpinKube installation guide and then you can do a spin kube scaffold ... to deploy your app to the cluster.

I'd still like to keep debugging this in parallel

@kate-goldenring
Copy link
Collaborator

We reached out to some of the containerd maintainers on the #spinkube CNCF slack and it sounds like Docker Desktop 4.30 has an issue with containerd that should be resolved in the 4.31 release next week. Another option is to downgrade to 4.29

@matsbror
Copy link
Author

matsbror commented May 29, 2024

Thanks, but I am not using docker desktop. I installed the docker engine manually and built it from source. I will give your suggestion with k3s a try.

@alexcrichton
Copy link

With respect to the Wasmtime-specific bits here (the panic here coming from Wasmtime itself) I opened a thread on our Zulip to talk a bit more about that. One hypothesis is that there's a syscall filter in play here (e.g docker filtering syscalls) and Wasmtime is using a new risc-v-specific syscall (sys_riscv_flush_icache). Unfortunately I'm not familiar enough with the various layers here to know if this is even happening at all or if it is where it would be, but if others are more knowledgable about syscall filtering do y'all know where a list of allowed syscalls would be maintained? I could imagine that sys_riscv_flush_icache would be "pretty new" insofar that it'll only come up with riscv64.

@alexcrichton
Copy link

Afonso on our Zulip found a simliar problem with the JVM. Others might recognize the comment there and various possible hooks to pass to Docker or configuration to specify perhaps as a workaround.

@matsbror
Copy link
Author

matsbror commented Jun 5, 2024

Then try ctr again:

sudo ctr image pull ttl.sh/spin-wasm-app:48h
sudo ctr run --rm --net-host --runtime io.containerd.spin.v2 ttl.sh/spin-wasm-app:48h myapp bogus-arg
# output should look similar to a `spin up`

You should now be able to interact with the app in the same way as after a spin up:

curl localhost:80/yo

It was a bit premature that I said this worked. I can run the container this way, and also the docker-built container, but I cannot curl it.

$ sudo ctr run --rm --net-host --runtime io.containerd.spin.v2 docker.io/matsbror/docker-wasm:spin-0.3 myapp2 bogus-arg2

Serving http://0.0.0.0:80
Available Routes:
  dockercon: http://0.0.0.0:80/yo

In another shell:

$ curl http://127.0.0.1:80
curl: (7) Failed to connect to 127.0.0.1 port 80 after 0 ms: Couldn't connect to server

I thought the --net-host flag should let the traffic through, but apparantly not here.

Edit: I checked the same on an x86-node, with the same result, so I think that the container is running, but for some reason the http request does not go through.

@kate-goldenring
Copy link
Collaborator

kate-goldenring commented Jun 5, 2024

@matsbror I also was not able to hit the endpoint at first, but with a Not Found:

kagold@kagold-ThinkPad-X1-Carbon-6th:~$ curl localhost:80/yo
404 page not found

I think it is due to K3s already binding to port 80. Sadly, the port is not configurable right now, but work is underway on this. After deleting the cluster on the node, the request went through. I'd try adding a RISCV node and deploying a spinapp there and see what happens.

Note: be sure to also hit the /yo route/

@matsbror
Copy link
Author

matsbror commented Jun 6, 2024

@kate-golcontainerdenring it does work when I stop k3s agent service. But k3s (and nothing else) does not listen to port 80 according to netstat, so I do not understand why it should make a difference.

But how can I make a containerized spin app use another port. I do not understand how it uses port 80 when the default spin app port is 3000. Can you please explain?

EDIT: I just looked into the source code of the shim and found that the port 80 is defined here. I will need to figure out how to pass a new IP and port to the shim, or recompile the shim with another port default.

@kate-goldenring
Copy link
Collaborator

kate-goldenring commented Jun 6, 2024

@matsbror I was also a little mystified when i didn't see the port in use by k3s but still saw the outcome of stopping the service. This is due to traefik using port 80 -- see this forum. I confirmed this with kubectl get service traefik -n kube-system. You could change the traefik configuration for k3s.

Yes, the port is not configurable currently. We have an issue tracking how we could add it through a container arg here #52. Right now, as you mention, you would have to recompile the shim with another port.

@matsbror
Copy link
Author

@kate-goldenring an update on my part here. I have verified that I can run wasm images directly using containerd (both the bundled with k3s and the latest running as a service). However, when I want to start it from k3s, I get the following error in the logs:

Jun 12 06:35:26 k3s-riscv-1 k3s[39048]: E0612 06:35:26.055872   39048 pod_workers.go:1294] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"wasm-spin-85cf4f8479-rdtcv_default(4c94ce00-8f24-4bae-b8f1-10ae93346c7b)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"wasm-spin-85cf4f8479-rdtcv_default(4c94ce00-8f24-4bae-b8f1-10ae93346c7b)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: io.containerd.spin.v2: InvalidArgument(\\\"error loading runtime spec: serde failed\\\")\\n: exit status 1: unknown\"" pod="default/wasm-spin-85cf4f8479-rdtcv" podUID=4c94ce00-8f24-4bae-b8f1-10ae93346c7b

I tried to look in the shim code for this error message but did not find it.

Any ideas?

@kate-goldenring
Copy link
Collaborator

@matsbror, it looks like the shim may not be configured properly on that node. How did you install the shim on it? With kwasm? Note, that k3s has it's own separate configuration for containerd that lives at /var/lib/rancher/k3s/agent/etc/containerd/config.toml. Can you confirm that the shim is configured there? If so, we can move on to increasing shim logging verbosity.

To debug the shim -- if it is even being executed -- we will need to increase the log level of the shim. By default, the log level of the shim is only error messages. To increase the shim log level, you have to set RUST_LOG=debug on the containerd process. Since k3s manages containerd, you'd set it on the k3s service.
If the process is managed by systemd, you can add an environment variable to the systemd service file. First, stop the service and bring up the service file:

systemctl stop k3s
systemctl edit k3s

At the top of the file, add the environment variable to the service:

[Service]
Environment="RUST_LOG=debug"

Finally, restart the k3s service:

systemctl restart k3s

@matsbror
Copy link
Author

Thanks!

Configuration of containerd is right, it ends with:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.spin]
runtime_type = "io.containerd.spin.v2"

I get no more information than before with the debug information of the shim so I suspect the shim is never called (although the comment about serde seems to come from a Rust program, but I just learned that Go also has a serde package). I restarted k3s-agent with --debug and it seems as if the first error message comes from remote_runtime.go

I think the issue might be in k3s so I will rebuild it for RISCV. Right now I am taking the binaries for it here: https://github.com/CARV-ICS-FORTH/kubernetes-riscv64

@kate-goldenring
Copy link
Collaborator

@matsbror just an update that you can now configure the port that the shim runs on by setting SPIN_HTTP_LISTEN_ADDR container env var: #138

Here is an example (you'll want to replace the image ref as it will be expired):

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spin-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spin-test
  template:
    metadata:
      labels:
        app: spin-test
    spec:
      runtimeClassName: wasmtime-spin-v2
      containers:
      - name: spin-test
        image: ttl.sh/spin-return-envs:48h
        command: ["/"]
        ports:
        - containerPort: 82
          name: http-app
        env:
        - name: SPIN_HTTP_LISTEN_ADDR
          value: "0.0.0.0:82"
EOF

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: spin-test
spec:
  ports:
    - protocol: TCP
      port: 82
      targetPort: http-app
  selector:
    app: spin-test
EOF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants