Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.1: x86_64 getting stuck with qemu64 cpu #288

Closed
sadbuttrue1 opened this issue May 16, 2022 · 30 comments · Fixed by #292
Closed

v0.4.1: x86_64 getting stuck with qemu64 cpu #288

sadbuttrue1 opened this issue May 16, 2022 · 30 comments · Fixed by #292

Comments

@sadbuttrue1
Copy link

Updated colima from 0.3.4 to 0.4.1.
Did colima delete and colima start (colima start --arch amd --cpu 4 --memory 4 to be precise).
Got

FATA[0642] error starting vm: error at 'creating and starting': exit status 1

Here is log after starting:
serial.log

macOS version 12.3.1
CPU Apple M1 Max
colima version 0.4.1
lima version 0.10.0
qemu version 6.2.0_1

@sadbuttrue1 sadbuttrue1 changed the title Soft lockup or kernel panic after update Failed to start after update May 16, 2022
@abiosoft
Copy link
Owner

Can you kindly try again with --verbose flag specified and share the output.
Also, does it work if you use the default arch for your machine i.e. aarch64 ?

@sadbuttrue1
Copy link
Author

If I delete it with colima delete and create again with colima start --arch amd --cpu 4 --memory 4 it'll run.
But sooner or later in ~/.lima/colima/serial.log I'll start seeing:

[  176.564027] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:11:4089]
[  176.713386] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [buildctl:3452]
[  177.014163] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [jbd2/vda1-8:2139]
[  204.563929] watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [kworker/0:11:4089]
[  204.713915] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [buildctl:3452]
[  205.014359] watchdog: BUG: soft lockup - CPU#3 stuck for 48s! [jbd2/vda1-8:2139]

and everything docker related will be stuck.
serial.log

@sadbuttrue1
Copy link
Author

Can you kindly try again with --verbose flag specified and share the output.
Also, does it work if you use the default arch for your machine i.e. aarch64 ?

Yes, I can try it with --verbose. On which stage should I do it? Freshly installed colima, lima and qemu?

I didn't try it with aarch64 because I need x86_64. And x86_64 was fine with 0.3.4.

@sadbuttrue1
Copy link
Author

sadbuttrue1 commented May 16, 2022

Result of colima start --arch amd --cpu 4 --memory 4 --verbose:

INFO[0000] starting colima
INFO[0000] runtime: docker
INFO[0000] preparing network ...                         context=vm
INFO[0000] creating and starting ...                     context=vm
> msg="Terminal is not available, proceeding without opening an editor"
> msg="Attempting to download the image from \"https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.4.0-5/alpine-lima-clm-3.15.4-x86_64.iso\"" digest="sha512:6ce50a109cc90f537cc0eb363ab938ba10ea690dc59e1f4c0081ecd90907da5fb305ab27a98ebec774e97d41a713f73f0a4e7edc44019a7db134a07181bb8801"
>
> 38.33 MiB / 267.00 MiB (14.36%) ? p/s
> 84.63 MiB / 267.00 MiB (31.70%) 9.26 MiB/s
> 119.52 MiB / 267.00 MiB (44.76%) 9.11 MiB/s
> 159.67 MiB / 267.00 MiB (59.80%) 9.04 MiB/s
> 194.58 MiB / 267.00 MiB (72.88%) 8.91 MiB/s
> 216.72 MiB / 267.00 MiB (81.17%) 8.62 MiB/s
> 241.14 MiB / 267.00 MiB (90.32%) 8.38 MiB/s
> 246.30 MiB / 267.00 MiB (92.25%) 7.91 MiB/s
> 251.24 MiB / 267.00 MiB (94.10%) 7.46 MiB/s
> 261.19 MiB / 267.00 MiB (97.82%) 7.11 MiB/s
> 267.00 MiB / 267.00 MiB (100.00%) 5.59 MiB/stime="2022-05-16T21:06:04+03:00" level=info msg="Downloaded image from \"https://github.com/abiosoft/alpine-lima/releases/download/colima-v0.4.0-5/alpine-lima-clm-3.15.4-x86_64.iso\""
> msg="[hostagent] local user \"evgenii.aslanov\" is not a valid Linux username (must match \"^[a-z_][a-z0-9_-]*$\"); using \"lima\" username instead"
> msg="[hostagent] Starting QEMU (hint: to watch the boot progress, see \"/Users/evgenii.aslanov/.lima/colima/serial.log\")"
> msg="SSH Local Port: 54697"
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="[hostagent] Waiting for the essential requirement 1 of 5: \"ssh\""
> msg="did not receive an event with the \"running\" status"
FATA[0654] error starting vm: error at 'creating and starting': exit status 1

serial.log

With aarch64 it works fine, at least it starts and doesn't stuck in 2-3 minutes.

@abiosoft
Copy link
Owner

Does specifying --cpu-type qemu64 improve anything ?

@abiosoft
Copy link
Owner

@sadbuttrue1 I have been able to reproduce it, using --cpu-type kvm64 works. Can you kindly confirm if it works for you as well?

I need to figure out why the default config stops working.

@sadbuttrue1
Copy link
Author

sadbuttrue1 commented May 16, 2022

@abiosoft, with qemu64 it did stuck without messages in serial.log.
With colima start --arch amd --cpu 4 --memory 4 --verbose --cpu-type kvm64 it seems to be fine, at least it didn't stuck in 10 minutes :)

@abiosoft abiosoft changed the title Failed to start after update v0.4.1: x86_64 getting stuck with qemu64 cpu May 16, 2022
@abiosoft
Copy link
Owner

As a side note, is x86_64 emulation your primary use-case? I would expect the degraded performance to discourage extensive use.

@sadbuttrue1
Copy link
Author

Yes, I'm using it to run https://hub.docker.com/_/microsoft-mssql-server.
You mean that with the kvm64 it'll be slower than it was with default params of 0.3.4?

@abiosoft
Copy link
Owner

You mean that with the kvm64 it'll be slower than it was with default params of 0.3.4?

Not at all, it will be similar performance to the default processor. I'm only referring to the fact that emulation is always noticeably slower than native architecture.

@rfay
Copy link
Contributor

rfay commented May 16, 2022

@sadbuttrue1 Do put your "me too" on microsoft/mssql-docker#734 - when they get to that it will make a big difference.

@sadbuttrue1
Copy link
Author

@abiosoft, yes, I know about slowness of emulation. But currently it's the only option for testing with mssql that have full text search feature.

@rfay, thanks, will put!

@mritd
Copy link

mritd commented May 17, 2022

same issue here...

image

For my use case:

x86 emulation is usually used to test some images that do not provide arm support; or in some cases we need to build x86 docker image to push to a remote server to run it.

@abiosoft
Copy link
Owner

@mritd specifying --cpu-type kvm64 works as seen here #288 (comment) and specifying --cpu-type Haswell-v4 also works.

Looking at lima-vm/lima#641 I can see the previous default value was Haswell-v4, but that was changed a while back since Lima v0.8.2.

@mritd
Copy link

mritd commented May 17, 2022

Oh sorry, I forgot to mention... I tested --cpu-type and it works fine, just a little slow....

@abiosoft
Copy link
Owner

Oh sorry, I forgot to mention... I tested --cpu-type and it works fine, just a little slow....

@mritd you mean it's slower than before?

@mritd
Copy link

mritd commented May 17, 2022

@abiosoft Just feeling, I didn't do the full test.

@oanhnn
Copy link

oanhnn commented May 18, 2022

My environment

MacBook Air (M1, 2020)
macOS version 12.3.1
CPU Apple M1
colima version 0.4.1
lima version 0.10.0
qemu version 6.2.0_1

I got the same error after start and pull some docker image.

$ colima start -a x86_64 -c 2 -m 4
$ docker ps
$ docker context ls
$ docker image pull ...

But it working fine with lima and docker template

$ limactl start ~/.lima/docker.yml
$ docker context use lima
$ docker ps
$ docker image pull ...

Below is information about the QEMU process.

$ ps aux | grep qemu              
oanhnn           36037   6.6 11.9 414593056 1994976 s000  S    10:15AM   1:28.83 qemu-system-x86_64 -m 4096 -cpu qemu64 -machine q35,vmport=off -accel tcg,thread=multi,tb-size=512 -global ICH9-LPC.disable_s3=1 -smp 2,sockets=1,cores=2,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Users/oanhnn/.colima/_wrapper/share/qemu/edk2-x86_64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/oanhnn/.lima/colima/basedisk,media=cdrom,readonly=on -drive file=/Users/oanhnn/.lima/colima/diffdisk,if=virtio -cdrom /Users/oanhnn/.lima/colima/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:60277-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:09:af:6f -device virtio-rng-pci -display none -device virtio-vga -device virtio-keyboard-pci -device virtio-mouse-pci -parallel none -chardev socket,id=char-serial,path=/Users/oanhnn/.lima/colima/serial.sock,server=on,wait=off,logfile=/Users/oanhnn/.lima/colima/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/oanhnn/.lima/colima/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-colima -pidfile /Users/oanhnn/.lima/colima/qemu.pid -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan,mac=5a:94:ef:b2:dd:3c
oanhnn           36036   0.0  0.1 409232816  17760 s000  S    10:15AM   0:00.01 /Users/oanhnn/.colima/_wrapper/bin/qemu-system-x86_64 -m 4096 -cpu qemu64 -machine q35,vmport=off -accel tcg,thread=multi,tb-size=512 -global ICH9-LPC.disable_s3=1 -smp 2,sockets=1,cores=2,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Users/oanhnn/.colima/_wrapper/share/qemu/edk2-x86_64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/oanhnn/.lima/colima/basedisk,media=cdrom,readonly=on -drive file=/Users/oanhnn/.lima/colima/diffdisk,if=virtio -cdrom /Users/oanhnn/.lima/colima/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:60277-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:09:af:6f -device virtio-rng-pci -display none -device virtio-vga -device virtio-keyboard-pci -device virtio-mouse-pci -parallel none -chardev socket,id=char-serial,path=/Users/oanhnn/.lima/colima/serial.sock,server=on,wait=off,logfile=/Users/oanhnn/.lima/colima/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/oanhnn/.lima/colima/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-colima -pidfile /Users/oanhnn/.lima/colima/qemu.pid
oanhnn           36145   0.0  0.0 408628368   1488 s000  R+   10:19AM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox qemu
$ ps aux | grep qemu
oanhnn           36888   9.0 19.2 414831520 3227456 s000  S    10:36AM   6:45.86 /opt/homebrew/bin/qemu-system-x86_64 -m 4096 -cpu qemu64 -machine q35,vmport=off -accel tcg,thread=multi,tb-size=512 -global ICH9-LPC.disable_s3=1 -smp 4,sockets=1,cores=4,threads=1 -drive if=pflash,format=raw,readonly=on,file=/opt/homebrew/share/qemu/edk2-x86_64-code.fd -boot order=c,splash-time=0,menu=on -drive file=/Users/oanhnn/.lima/docker/diffdisk,if=virtio -cdrom /Users/oanhnn/.lima/docker/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:60700-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:42:92:4b -device virtio-rng-pci -display none -device virtio-vga -device virtio-keyboard-pci -device virtio-mouse-pci -parallel none -chardev socket,id=char-serial,path=/Users/oanhnn/.lima/docker/serial.sock,server=on,wait=off,logfile=/Users/oanhnn/.lima/docker/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/oanhnn/.lima/docker/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-docker -pidfile /Users/oanhnn/.lima/docker/qemu.pid
oanhnn           37144   0.0  0.0 408637584   1792 s000  S+   10:43AM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox qemu

And serial.log of colima

@Meligy
Copy link

Meligy commented May 18, 2022

I have just upgraded too and while the start command worked the 2nd time, any docker command or even colima list was just going forever (it actually killed my terminal, seemingly because my prompt, spaceship, gets some docker info, which was resolved by commenting out the prompt from my zsh defaults).

Is there a start command that still works, even if slower? (I have M1 Max anyway)
Any additional arguments to pass?
Editing the configuration with --edit?

@Meligy
Copy link

Meligy commented May 18, 2022

This is most of the Dockerfile that seems to break it. It's essentially SQL Server with enabled Free Text Search. It worked just fine with older version of colima, and it has not been modified recently

# Using non default image as Full-text search support is not included in it
# This image implements https://schwabencode.com/blog/2019/10/27/MSSQL-Server-2017-Docker-Full-Text-Search
# We can implement it ourselves,
#   but our Dockerfile is rebuilt with the local script and the steps in the image very slow.
# It's a specific tag, verifiable at https://hub.docker.com/r/benjaminabt/mssql-fts/tags
FROM benjaminabt/mssql-fts:2019-cu10-ubuntu-2004
ENV ACCEPT_EULA Y

# ENV MSSQL_PID # Does not matter
ENV MSSQL_SA_PASSWORD DoesNotMatter

# Missing from the base image: `mssql-tools`, needs above ACCEPT_EULA env variable set first
# Found via https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-tools?view=sql-server-ver15#ubuntu
# The original image does some cleanup, so we have to repeat parts of their script. It's still faster.
RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    apt-get install -yq curl apt-transport-https && \
    curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list | tee /etc/apt/sources.list.d/msprod.list && \
    apt-get update && \
    apt-get install -y mssql-tools unixodbc-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists

# Run SQL Server process
CMD /opt/mssql/bin/sqlservr

# ... File clipped intentionally

Once I try to run this (via docker-compose), it hangs:

image

And from then on any command interacting with colima or docker hangs, even colima list....
Wth one lucky exception being colima delete.

@oanhnn
Copy link

oanhnn commented May 18, 2022

@Meligy I think you should try with below two ways:

  1. Using --cpu-type when start a Colima instance.
$ colima colima start -a x86_64 -c 4 -m 4 --cpu-type kvm64
$ docker run hello-world

Or --cpu-type Haswell-v4

See

#288 (comment)
#288 (comment)

  1. Using lima with docker template (my choice)
$ lima start --name=docker https://raw.githubusercontent.com/lima-vm/lima/master/examples/docker.yaml
$ docker context create lima --docker "host=unix:///Users/<your username>/.lima/docker/sock/docker.sock"
$ docker context use lima
$ docker run hello-world

After this issue is fixed, we can use Colima agian 😄

@abiosoft
Copy link
Owner

Can you all try the development version with brew install --HEAD colima.

If it works fine, a new release will be made.

@Meligy
Copy link

Meligy commented May 18, 2022

@abiosoft thanks a lot. --head does work.

It seems quite slow though. For the above image with this command (without explicitly setting --cpu-type):

colima start --arch x86_64 --cpu 4 --memory 4 --disk 60

The extraction time is quite too slow (I don't have a baseline to compare to, but it feels way way slower). Same for running the apt commands in the Dockerfile, and for starting SQL Server and running a handful SQL scripts.

image

However, this is still an improvement over not working at all though.
Thanks a lot.

@abiosoft
Copy link
Owner

@Meligy thanks for the feedback, I would investigate a bit more.

@Meligy
Copy link

Meligy commented May 18, 2022

I also tried the last "released" version (not --head), with the following command based on @oanhnn's reply:

 colima start --arch x86_64 --cpu 4 --memory 4 --disk 60 --cpu-type Haswell-v4

I set --cpu-type to Haswell-v4 to try something different, as looking at latest commit, it seemed kvm64 was used by colima when I didn't specify it in the previous test.

The non-scientific results were pretty comparable times, if not tiny bit slower maybe.

image

I think this makes --head the best current option.

@abiosoft
Copy link
Owner

can anyone confirm if the latest development version brew install --HEAD colima works as desired.

@Meligy
Copy link

Meligy commented May 19, 2022

@abiosoft thanks heaps for this. It's much better.

I deleted the old one, installed the --head version, and created a new one with the following command (no --cpu-type to make sure it works without it, and it did work):

colima start --arch x86_64 --cpu 4 --memory 4 --disk 60

And then my timings were at least 50% less. You can compare this pic with above screenshots:

image

The actual container performance, seems much improved as well, as you see waiting for SQL Server to start and running a. handful small scripts now takes half a minute (last number) instead of 3.5+ minutes.

Thanks again.

@abiosoft
Copy link
Owner

@Meligy will push out a new release shortly. Thanks for confirming.

@abiosoft
Copy link
Owner

Fix now in stable version https://github.com/abiosoft/colima/releases/tag/v0.4.2

@Meligy
Copy link

Meligy commented May 20, 2022

Thanks heaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants