Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Agones on ARM systems #2216

Closed
wernerphysics opened this issue Aug 10, 2021 · 29 comments · Fixed by #2514
Closed

Support Agones on ARM systems #2216

wernerphysics opened this issue Aug 10, 2021 · 29 comments · Fixed by #2514
Labels
kind/feature New features for Agones
Milestone

Comments

@wernerphysics
Copy link

wernerphysics commented Aug 10, 2021

I'm running Agones in a local Minikube cluster on an M1 Mac. When I create a basic xonotic game server from this repo's examples, I get an Image Pull Error. The error details specify that the image "gcr.io/agones-images/agones-sdk" does not have a linux/arm64/v8 manifest. The issue seems to be that the Docker runtime when running on M1 Mac expects only ARM images.

My suggestion is to provide an ARM64 build of the Agones SDK in the GCR repo so that developers on ARM machines can run Agones/Minikube locally.

@wernerphysics wernerphysics added the kind/feature New features for Agones label Aug 10, 2021
@markmandel
Copy link
Member

I can see us needing to support ARM in the future long term, so this is a good ticket to track this work. 👍🏻

Out of curiosity though: On an M1, can you only run arm docker images? (i.e. minikube won't work with anything else?) or is there a path to running x86 images?

@wernerphysics
Copy link
Author

I looked into the issue some more and in fact you can run x86 images on M1 Mac Minikube. It works transparently through the Rosetta2 compatibility layer.

The issue: when the pod is created, it pulls agones-sdk:1.16.0 without an additional tag, which on a default M1 Mac install requests the nonexistent ARM64 image. I got it to work by editing the pod spec to specifically pull the agones-sdk:1.16.0-linux_amd64 image. After this step, both containers in the xonotic-example pod are working.

For what it's worth, I think this only works because of Apple's x86 emulation. It would still be helpful to add the ARM image tag for users on ARM platforms without robust emulation support.

@markmandel
Copy link
Member

It would still be helpful to add the ARM image tag for users on ARM platforms without robust emulation support.

I agree. Also combined with #2223 we should probably cross compile all the components.

@davidpdrsn
Copy link

@wernerphysics I'm having the same issue. Could you elaborate on what you meant by

I got it to work by editing the pod spec to specifically pull the agones-sdk:1.16.0-linux_amd64 image.

I'm still sort new to k8s 😅

@markmandel
Copy link
Member

If we want to do this in pieces, to start, let's get a ARM binary generated:

agones/build/Makefile

Lines 382 to 384 in 15678b6

build-agones-sdk-binary: $(ensure-build-image) build-agones-sdk-binary-linux build-agones-sdk-binary-windows build-agones-sdk-binary-darwin
$(ZIP_SDK) \
agonessdk-server-$(VERSION).zip sdk-server.darwin.amd64 sdk-server.linux.amd64 sdk-server.windows.amd64.exe

(Do you need an arm binary for local sdk server development?)

@markmandel markmandel changed the title Support Agones SDK on ARM systems Support Agones on ARM systems Aug 24, 2021
@markmandel
Copy link
Member

Consolidating this with #2223 - making a note that to run on K8s clusters that are entirely ARM, we (may?) need to also provide a build of the controller as well (testing likely required).

@wernerphysics
Copy link
Author

@wernerphysics I'm having the same issue. Could you elaborate on what you meant by

I got it to work by editing the pod spec to specifically pull the agones-sdk:1.16.0-linux_amd64 image.

I'm still sort new to k8s 😅

There's definitely a better way to do this (probably changing the default SDK when you Helm install Agones), but here's my quick and dirty fix. When the pod is running, edit the spec with something like kubectl edit pod xonotic and find the line where the Agones SDK image is specified, should look like image: gcr.io/agones-images/agones-sdk:1.16.0. Look at the images at http://gcr.io/agones-images/agones-sdk, and rewrite the pod spec to use the tag of the latest linux_amd64 image. This seemed to work for me, good luck!

@markmandel
Copy link
Member

You could also edit the install.yaml, or if you are using helm, tweak the agones.image.sdk.tag parameter.

@tuapuikia
Copy link

I manage to build the side car image (agones-sdk) for arm based server (AWS Graviton) and use the helm template to generate the yaml to use my private repo.

Other components still run on amd64 instance but planning to cross compile them and test it in coming weeks.

It's a good start. 😁

@markmandel
Copy link
Member

@tuapuikia can you share what you did as a PR? Unfortunately it's only useful to the project if you share what it is you have done 😄

@tuapuikia
Copy link

Hi @markmandel,

My use case is to host a game server on the AWS Gravition instance to save cost and utilize the new powerful AWS instance type.

Below is what I have done.

  1. Deploy EKS + A1 and C6 custom instance groups.
  2. Deploy Agones to my new EKS cluster.
  3. Recompile example "supertuxkart" with ARM machine.
  4. Deploy supertuxkart example to the new ARM worker node.
  5. Compare the container and server performance between x86 vs ARM64.
  6. Propose to use ARM64 in my company. (WIP)

TF module https://github.com/tuapuikia/terraform-eks-arm

The patch of the makefile is still messy and not ready for PR.
What I have done is using docker "buildx" and Ubuntu/Linux binfmt to cross-compile the agones sdk server in order to get the arm64 docker image.

@markmandel
Copy link
Member

markmandel commented Mar 22, 2022

Oops, work not complete yet, so let's not close this.

@markmandel
Copy link
Member

Question: Have we done the PING service?

@roberthbailey
Copy link
Member

I don't think so. I only see arm tags for the sidecar, allocator, and controller:

agones/build/Makefile

Lines 244 to 248 in 849b4d3

ifeq ($(WITH_ARM64), 1)
push_sidecar_manifest += $(sidecar_linux_arm64_tag)
push_allocator_manifest += $(allocator_arm64_tag)
push_controller_manifest += $(controller_arm64_tag)
endif

@markmandel
Copy link
Member

I think that's the last bit then - @Ludea

@Ludea
Copy link
Contributor

Ludea commented May 20, 2022

Uploading Screenshot_20220520-180243.jpg…
I can run successfully agones on aarch64 hosts

@markmandel
Copy link
Member

Oops, looks like the screenshot broke?

@Ludea
Copy link
Contributor

Ludea commented May 20, 2022

Screenshot_20220520-180243
Better

@markmandel
Copy link
Member

So there should be a few ping pods as well. This is what it looks like on my GKE cluster:

markmandel@cloudshell:~ (agones-mark-dev)$ kubectl get pods -n agones-system
NAME                                   READY   STATUS      RESTARTS   AGE
agones-allocator-57bb5d9494-glk6h      1/1     Running     0          16h
agones-allocator-57bb5d9494-mfnfc      1/1     Running     0          16h
agones-allocator-57bb5d9494-vll82      1/1     Running     0          16h
agones-controller-695f675fc6-xxf2c     1/1     Running     0          16h
agones-delete-agones-resources-4cxmf   0/1     Completed   0          16h
agones-ping-6fd49fcfdc-6wcbn           1/1     Running     0          16h
agones-ping-6fd49fcfdc-z66zv           1/1     Running     0          16h

Also, lots of points for being able to run kubectl on your phone. 😁

I'm willing to bet you have the Deployments in place for it, but the image pull for arm64 is failing.

@Ludea
Copy link
Contributor

Ludea commented May 20, 2022

Also, lots of points for being able to run kubectl on your phone. 😁

Thanks !

I'm willing to bet you have the Deployments in place for it, but the image pull for arm64 is failing.

I hit #2578 so I install #2581 image tag.
I set agones.allocator.replicas=1 because Rpi3 have only 1GB memory^^ and agones.ping.installto false because I know ping image doesnt support arm64 img yet .
So the last thing is to port ping img ?

@markmandel
Copy link
Member

So the last thing is to port ping img ?

I believe so! Will likely want to have a few of us test some scenarios on an ARM cluster, but I think that's the last step.

@markmandel
Copy link
Member

oooh, I thought of one more thing!

https://github.com/googleforgames/agones/blob/main/examples/simple-game-server < that will need to also be updated, as that's the sample we use for all our e2e tests, so we'll need that too so that we can run the e2e tests on a arm cluster (and manual tests too).

(Should probably have done a checklist in the beginning)

@markmandel
Copy link
Member

So just attempted to build simple-game-server and unfortunately it didn't work. Digging into what went wrong.

➜  simple-game-server git:(main) make build
Makefile:83: warning: overriding recipe for target 'push'
Makefile:79: warning: ignoring old recipe for target 'push'
Makefile:88: warning: overriding recipe for target 'push'
Makefile:83: warning: ignoring old recipe for target 'push'
cd /home/mark/workspace/agones && docker build -f /home/mark/workspace/agones/examples/simple-game-server/Dockerfile --tag=gcr.io/agones-images/simple-game-server:0.13-linux-amd64 .
Sending build context to Docker daemon  1.137GB
Step 1/11 : FROM golang:1.17.2 as builder
 ---> 9f8b89ee4475
Step 2/11 : WORKDIR /go/src
 ---> Using cache
 ---> 7cc04c3e643e
Step 3/11 : COPY . agones.dev/agones
 ---> Using cache
 ---> 64ff92bf4f49
Step 4/11 : WORKDIR /go/src/agones.dev/agones/examples/simple-game-server
 ---> Using cache
 ---> c55809501703
Step 5/11 : RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server .
 ---> Running in 0149925cc97b
go: downloading github.com/golang/protobuf v1.5.0
go: downloading github.com/grpc-ecosystem/grpc-gateway v1.11.3
go: downloading golang.org/x/net v0.0.0-20210224082022-3d97a244fca7
go: downloading google.golang.org/genproto v0.0.0-20201110150050-8816d57aaa9a
go: downloading google.golang.org/grpc v1.27.1
go: downloading github.com/pkg/errors v0.9.1
go: downloading google.golang.org/protobuf v1.26.0
go: downloading golang.org/x/sys v0.0.0-20210426230700-d19ff857e887
go: downloading golang.org/x/text v0.3.4
go: updates to go.mod needed; to update it:
        go mod tidy
The command '/bin/sh -c CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server .' returned a non-zero code: 1
make: *** [Makefile:122: build-linux-image-amd64] Error 1

@markmandel
Copy link
Member

Looks like go mod tidy solved the issue. PR incoming.

@markmandel
Copy link
Member

Spoke too soon.

➜  simple-game-server git:(main) ✗ make build-linux-image-arm64
Makefile:83: warning: overriding recipe for target 'push'
Makefile:79: warning: ignoring old recipe for target 'push'
Makefile:88: warning: overriding recipe for target 'push'
Makefile:83: warning: ignoring old recipe for target 'push'
cd /home/mark/workspace/agones && docker build -f /home/mark/workspace/agones/examples/simple-game-server/Dockerfile --platform linux/arm64 --tag=gcr.io/agones-images/simple-game-server:0.13-linux-arm64 .
Sending build context to Docker daemon  1.137GB
Step 1/11 : FROM golang:1.17.2 as builder
 ---> 9f8b89ee4475
Step 2/11 : WORKDIR /go/src
 ---> Using cache
 ---> 0af6bf5e5318
Step 3/11 : COPY . agones.dev/agones
failed to get destination image "sha256:0af6bf5e5318126fb7f755b06a412e36836e95d6ef497c71a49a6d99753b3917": image with reference sha256:0af6bf5e5318126fb7f755b06a412e36836e95d6ef497c71a49a6d99753b3917 was found but does not match the specified platform: wanted linux/arm64, actual: linux/amd64
make: *** [Makefile:124: build-linux-image-arm64] Error 1

@markmandel
Copy link
Member

I reckon I have the fixes in place to make it work - just testing now against both an ARM64 and a AMD64 clusters to make sure everything is golden.

markmandel added a commit to markmandel/agones that referenced this issue Jun 7, 2022
Fixes:
* Run `go mod tidy` otherwise failure to compile
* Fix building on amd64
* Fix target for building manifest appropriately.

Work on googleforgames#2216
roberthbailey added a commit that referenced this issue Jun 7, 2022
Fixes:
* Run `go mod tidy` otherwise failure to compile
* Fix building on amd64
* Fix target for building manifest appropriately.

Work on #2216

Co-authored-by: Robert Bailey <robertbailey@google.com>
@markmandel
Copy link
Member

gcr.io/agones-images/simple-game-server:0.13 should now have an arm and windows image on the registry. 👍🏻

We should move all the e2e tests over to this image as well.

markmandel added a commit to markmandel/agones that referenced this issue Jun 9, 2022
Added a section in the installation docs where we specify the
supported cluster requirements. Now we're supporting multiple OS's and
architectures at different levels, I've added a table to indicate each
of the levels, so that users are aware.

I didn't keep arm64 for the past release, since we know the controller
is broken for 1.23.0.

As part of this work, I moved the note on node pools into its own
"best practices" section.

Work on googleforgames#2216
roberthbailey added a commit that referenced this issue Jun 10, 2022
Added a section in the installation docs where we specify the
supported cluster requirements. Now we're supporting multiple OS's and
architectures at different levels, I've added a table to indicate each
of the levels, so that users are aware.

I didn't keep arm64 for the past release, since we know the controller
is broken for 1.23.0.

As part of this work, I moved the note on node pools into its own
"best practices" section.

Work on #2216

Co-authored-by: Robert Bailey <robertbailey@google.com>
@markmandel
Copy link
Member

Can we close this now? I think we're good? WDYT @roberthbailey ?

@roberthbailey
Copy link
Member

Yes. We now have alpha support for ARM (documented here) and I think this can be closed.

As with feature gate promotions, we can open tickets specifically targeted at moving from alpha towards stable going forward.

@mangalpalli mangalpalli added this to the 1.27.0 milestone Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants