Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GameServerAlocation as API Extension #682

Merged

Conversation

markmandel
Copy link
Member

@markmandel markmandel commented Mar 31, 2019

Dependent on #659 which will be merged after 0.9.0 is released

This moves the implementation of GameServerAllocation (GSA) to a
Kubernetes API Extension instead of using CRDs. This was essentially done for performance reasons, but to break it down:

  1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
  2. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    2. Validation now happens in the request.
  3. We can now do batching of requests for higher throughput (Use batching in GameServerAllocation controller to improve throughput. #536), since we control the entire http request.

The breaking changes are:

  1. GameServerAllocation's group is now allocation.agones.dev rather than stable.agones.dev,
    because a CRD group can't overlap with a api server.
  2. Since there is only the create verb for GSA, there is no get/list/watch options for GameServerAllocations - so no informers/listers either. But this could be added at a later date, if needed.

gsa := &v1alpha1.GameServerAllocation{}
// allocationHandler CRDHandler for allocating a gameserver. Only accepts POST
// commands
func (c *Controller) allocationHandler(w http.ResponseWriter, r *http.Request, namespace string) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @pooneh-m @ilkercelikyilmaz for your review and testing (also, I don't think I broke anything you did @ilkercelikyilmaz , but would love for you to give it a look over to make sure) 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from ebabf1c to aaa6e91 Compare March 31, 2019 22:57
@markmandel markmandel added kind/feature New features for Agones kind/breaking Breaking change area/performance Anything to do with Agones being slow, or making it go faster. labels Mar 31, 2019
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: f429d470-235b-4bb0-a514-5877a6c16ef3

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.9.0-aaa6e91

Copy link
Contributor

@ilkercelikyilmaz ilkercelikyilmaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -16,12 +16,44 @@
{{- $altName1 := printf "agones-controller-service.%s" .Release.Namespace }}
{{- $altName2 := printf "agones-controller-service.%s.svc" .Release.Namespace }}
{{- $cert := genSignedCert $cn nil (list $altName1 $altName2) 3650 $ca }}
---
{{- if .Values.agones.registerApiService }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you don't want that?

also perhaps "registerAllocationApiService" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCP Marketplace usually requires a different install step, so there are switches to turn various pieces off if people need to break the glass and copy things into their own install process.

@bbf to confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the rename comment - we have registerWebhooks for all our webhooks - I don't think we need to be that granular. If there ever end up being more APIServices, they could also go here.


apiVersion: "stable.agones.dev/v1alpha1"
apiVersion: "allocation.agones.dev/v1alpha1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general question - is allocation the only thing that will be in this API going forward? Perhaps we need a broader name to indicate "services that external consumers are going to call".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really interesting question. Groups are pretty broad, and we could definitely expand this - and anything that is "external" would likely need to be managed by multiple pods to ensure HA (as discussed above). Which would give us an internal delineation - anything under this group needs to be managed by multiple pods, anything in the stable group, is managed by the controller (which self heals).

Maybe external.agones.dev ?

It also begets a question of is "stable" the right name for the original group. I'm starting to wonder if core.agones.dev would have been better? Or something else?

Definitely open to suggestions here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at #703 - I'm feeling like allocation is actually the right choice here for the group name.

@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from aaa6e91 to 2f3567b Compare April 3, 2019 18:12
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 28911819-68c5-432b-963c-77cc485c370f

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from 2f3567b to 3698601 Compare April 4, 2019 00:52
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 3bf0a23d-7862-488b-9259-86cc58634dc1

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-3698601

@markmandel markmandel marked this pull request as ready for review April 5, 2019 13:36
@markmandel
Copy link
Member Author

This should be good for review now 👍

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: c78236ba-778a-4e8d-a0d7-97e7218c6385

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-c11accf

@@ -2219,6 +2219,10 @@ <h3 id="WebhookPolicy">WebhookPolicy






Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aLekSer I can't work out why I'm not getting the allocation group for the GameServerAllocation on the docs? Do you have any ideas?

Copy link
Collaborator

@aLekSer aLekSer Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmandel, working on that.
I see errors when running make gen-api-docs:

W0415 09:33:04.694267       8 parse.go:239] Ignoring child directory agones.dev/agones/pkg/apis/allocation: unable to import "agones.dev/agones/pkg/apis/allocation": go/build: importGo agones.dev/agones/pkg/apis/allocation: exit status 1
go: finding agones.dev/agones/pkg/apis/allocation latest
go: finding agones.dev/agones/pkg/apis latest
go: finding agones.dev/agones/pkg latest
go: finding agones.dev/agones v0.9.0
go: downloading agones.dev/agones v0.9.0
go: extracting agones.dev/agones v0.9.0
can't load package: package agones.dev/agones/pkg/apis/allocation: unknown import path "agones.dev/agones/pkg/apis/allocation": cannot find module providing package agones.dev/agones/pkg/apis/allocation

W0415 09:33:06.845690       8 parse.go:239] Ignoring child directory agones.dev/agones/pkg/apis/allocation/v1alpha1: unable to import "agones.dev/agones/pkg/apis/allocation/v1alpha1": go/build: importGo agones.dev/agones/pkg/apis/allocation/v1alpha1: exit status 1

Copy link
Collaborator

@aLekSer aLekSer Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue using:

go mod edit --replace=agones.dev/agones@latest=../../../agones.dev/agones/

Resulted webpage could be found here (cherry-picked my go mod fix to this PR):
http://35.237.42.204:1313/docs/reference/agones_crd_api_reference/

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 8708d939-c574-442a-87af-0564f749995b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-a1bf145

@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from a1bf145 to c19eb7a Compare April 16, 2019 02:32
@markmandel
Copy link
Member Author

@Kuqd @jkowalski any blockers from approving and merging this? I know it's slowing down @pooneh-m not having this merged?

Just updated it to master (and assuming it passes) I'm hoping this is good to go? 😄

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: c2e0ebaa-167c-45a0-8deb-285e72592320

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-c19eb7a

@ilkercelikyilmaz
Copy link
Contributor

I am running the load test on this change and so far it is looking good. I don't see any reason not to merge this change.
P.S. I am working on few more improvements on top of this change so it will be great to have this change merged.

@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from c19eb7a to b97522d Compare April 16, 2019 15:34
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: a6209f41-319d-4940-9446-6048d5e3195f

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

This moves the implementation of GameServerAllocation (GSA) to a
[Kubernetes API Extension](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
instead of using CRDs. This was essentially done for performance reasons, but to break it down:

1. GSA is now create only. Since we had no need for GSA storage, and don't want the performance hit.
1. This removes the mutation and validation webhooks, which have 30s timeout and are run in serial
    1. API Server still does cut off a response after 60s, but the api can continue processing (60s gives us enough
       time, I think for a SDK.Ack() on Allocate, which I don't think we had before)
    1. Validation now happens in the request.
1. We can now do batching of requests for higher throughput (googleforgames#536), since we control the entire http request.

The breaking changes are:
1. GameServerAllocation's group is now `allocation.agones.dev` rather than `stable.agones.dev`,
because a CRD group can't overlap with a api server.
1. Since there is only the `create` verb for GSA, there is no get/list/watch options for GameServerAllocations - so no
   informers/listers either. But this could be added at a later date, if needed.
@markmandel markmandel force-pushed the feature/gsa-apiserver-2 branch from b97522d to cbc9cc3 Compare April 16, 2019 16:02
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 2a0f9442-f5ac-4f6a-82f8-3f0e29282992

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/682/head:pr_682 && git checkout pr_682
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.10.0-cbc9cc3

@markmandel markmandel merged commit 6d6524a into googleforgames:master Apr 16, 2019
@markmandel markmandel deleted the feature/gsa-apiserver-2 branch April 16, 2019 16:27
@markmandel markmandel added this to the 0.10.0 milestone Apr 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Agones being slow, or making it go faster. kind/breaking Breaking change kind/feature New features for Agones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants