Skip to content

Commit

Permalink
Prioritise Allocation from Nodes with Allocated/Ready GameServers
Browse files Browse the repository at this point in the history
One of the first parts for Node autoscaling (googleforgames#368) - make sure we essentially
bin pack our allocated game servers.

This change makes allocation first prioritise allocation from `Nodes` that
already have the most `Allocated` `GameServers`, and then in the case of a tie,
to the `Nodes` that have the most `Ready` `GameServers`.

This sets us up for the next part, such that when we scale down a Fleet,
it removes `GameServers` from `Nodes` that have the least `GameServers` on
them.
  • Loading branch information
markmandel committed Oct 10, 2018
1 parent d6858e2 commit 4cbaf3e
Show file tree
Hide file tree
Showing 12 changed files with 445 additions and 33 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ Documentation and usage guides on how to develop and host dedicated game servers
- [CPP Simple](./examples/cpp-simple) (C++) - C++ example that starts up, stays healthy and then shuts down after 60 seconds.
- [Xonotic](./examples/xonotic) - Wraps the SDK around the open source FPS game [Xonotic](http://www.xonotic.org) and hosts it on Agones.

### Advanced
- [Scheduling and Autoscaling](./docs/scheduling_autoscaling.md)

## Get involved

- [Slack](https://join.slack.com/t/agones/shared_invite/enQtMzE5NTE0NzkyOTk1LWQ2ZmY1Mjc4ZDQ4NDJhOGYxYTY2NTY0NjUwNjliYzVhMWFjYjMxM2RlMjg3NGU0M2E0YTYzNDIxNDMyZGNjMjU)
Expand Down
2 changes: 2 additions & 0 deletions docs/create_fleetautoscaler.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,4 +253,6 @@ simple-udp-mzhrl-zg9rq Ready 10.30.64.99 [map[name:default port:7745]]

## Next Steps

Read the advanced [Scheduling and Autoscaling](scheduling_autoscaling.md) guide, for more details on autoscaling.

If you want to use your own GameServer container make sure you have properly integrated the [Agones SDK](../sdks/).
6 changes: 6 additions & 0 deletions docs/fleet_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ metadata:
name: fleet-example
spec:
replicas: 2
scheduling: Packed
strategy:
type: RollingUpdate
rollingUpdate:
Expand Down Expand Up @@ -53,6 +54,11 @@ This is a very common pattern in the Kubernetes ecosystem.
The `spec` field is the actual `Fleet` specification and it is composed as follow:

- `replicas` is the number of `GameServers` to keep Ready or Allocated in this Fleet
- `scheduling`(⚠️⚠️⚠️ **This is currently a development feature and has not been released** ⚠️⚠️⚠️) defines how GameServers are organised across the cluster. Currently only affects Allocation, but will expand
in future releases. Options include:
"Packed" (default) is aimed at dynamic Kubernetes clusters, such as cloud providers, wherein we want to bin pack
resources. "Distributed" is aimed at static Kubernetes clusters, wherein we want to distribute resources across the entire
cluster. See [Scheduling and Autoscaling](scheduling_autoscaling.md) for more details.
- `strategy` is the `GameServer` replacement strategy for when the `GameServer` template is edited.
- `type` is replacement strategy for when the GameServer template is changed. Default option is "RollingUpdate", but "Recreate" is also available.
- `RollingUpdate` will increment by `maxSurge` value on each iteration, while decrementing by `maxUnavailable` on each iteration, until all GameServers have been switched from one version to another.
Expand Down
53 changes: 53 additions & 0 deletions docs/scheduling_autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Scheduling and Autoscaling

⚠️⚠️⚠️ **This is currently a development feature and has not been released** ⚠️⚠️⚠️

> Autoscaling is currently ongoing work within Agones. The work you see here is just the beginning.
Scheduling and autoscaling tend to go hand in hand, as where in the cluster `GameServers` are provisioned
tends to impact how to autoscale fleets up and down (or if you would even want to)

## Fleet Autoscaling

Fleet autoscaling is currently the only type of autoscaling that exists in Agones. It is also only available as a simple
buffer autoscaling strategy. Have a look at the [Create a Fleet Autoscaler](create_fleetautoscaler.md) quickstart,
and the [Fleet Autoscaler Specification](fleetautoscaler_spec.md) for details.

Node scaling, and more sophisticated fleet autoscaling will be coming in future releases ([design](https://github.com/GoogleCloudPlatform/agones/issues/368))

## Fleet Allocation Scheduling

There are two scheduling strategies for fleets - each designed for different types of Kubernetes Environments.

### Packed

This is the *default* Fleet scheduling strategy. It is designed for dynamic Kubernetes environments, wherein you wish
to scale up and down as load increases or decreases, such as in a Cloud environment where you are paying
for the infrastructure you use.

It attempts to _pack_ as much as possible into the smallest set of nodes, to make
scaling infrastructure down as easy as possible.

Currently, Allocation scheduling is the only aspect this strategy affects, but in future releases it will
also affect `GameServer` `Pod` scheduling, and `Fleet` scale down scheduling as well.

#### Allocation Scheduling

Under the "Packed" strategy, allocation will prioritise allocating `GameServers` to nodes that are running on
Nodes that already have allocated `GameServers` running on them.

### Distributed

This Fleet scheduling strategy is designed for static Kubernetes environments, such as when you are running Kubernetes
on bare metal, and the cluster size rarely changes, if at all.

This attempts to distribute the load across the entire cluster as much as possible, to take advantage of the static
size of the cluster.

Currently, the only thing the scheduling strategy affects is Allocation scheduling, but in future releases it will
also affect `GameServer` `Pod` scheduling, and `Fleet` scaledown scheduling as well.

#### Allocation Scheduling

Under the "Distributed" strategy, allocation will prioritise allocating `GameSerers` to nodes that have the least
number of allocated `GameServers` on them.
7 changes: 7 additions & 0 deletions examples/fleet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,13 @@ metadata:
spec:
# the number of GameServers to keep Ready or Allocated in this Fleet
replicas: 2
# defines how GameServers are organised across the cluster. Currently only affects Allocation, but will expand
# in future releases. Options include:
# "Packed" (default) is aimed at dynamic Kubernetes clusters, such as cloud providers, wherein we want to bin pack
# resources
# "Distributed" is aimed at static Kubernetes clusters, wherein we want to distribute resources across the entire
# cluster
scheduling: Packed
# a GameServer template - see:
# https://github.com/GoogleCloudPlatform/agones/blob/master/docs/gameserver_spec.md for all the options
strategy:
Expand Down
22 changes: 22 additions & 0 deletions pkg/apis/stable/v1alpha1/fleet.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,27 @@ import (
)

const (
// Packed scheduling strategy will prioritise allocating GameServers
// on Nodes with the most Allocated, and then Ready GameServers
// to bin pack as many Allocated GameServers on a single node.
// This is most useful for dynamic Kubernetes clusters - such as on Cloud Providers.
// In future versions, this will also impact Fleet scale down, and Pod Scheduling.
Packed SchedulingStrategy = "Packed"

// Distributed scheduling strategy will prioritise allocating GameServers
// on Nodes with the least Allocated, and then Ready GameServers
// to distribute Allocated GameServers across many nodes.
// This is most useful for statically sized Kubernetes clusters - such as on physical hardware.
// In future versions, this will also impact Fleet scale down, and Pod Scheduling.
Distributed SchedulingStrategy = "Distributed"

// FleetGameServerSetLabel is the label that the name of the Fleet
// is set to on the GameServerSet the Fleet controls
FleetGameServerSetLabel = stable.GroupName + "/fleet"
)

type SchedulingStrategy string

// +genclient
// +genclient:noStatus
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
Expand Down Expand Up @@ -56,6 +72,8 @@ type FleetSpec struct {
Replicas int32 `json:"replicas"`
// Deployment strategy
Strategy appsv1.DeploymentStrategy `json:"strategy"`
// Scheduling strategy. Defaults to "Packed".
Scheduling SchedulingStrategy `json:"scheduling"`
// Template the GameServer template to apply for this Fleet
Template GameServerTemplateSpec `json:"template"`
}
Expand Down Expand Up @@ -105,6 +123,10 @@ func (f *Fleet) ApplyDefaults() {
f.Spec.Strategy.Type = appsv1.RollingUpdateDeploymentStrategyType
}

if f.Spec.Scheduling == "" {
f.Spec.Scheduling = Packed
}

if f.Spec.Strategy.Type == appsv1.RollingUpdateDeploymentStrategyType {
if f.Spec.Strategy.RollingUpdate == nil {
f.Spec.Strategy.RollingUpdate = &appsv1.RollingUpdateDeployment{}
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/stable/v1alpha1/fleet_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,13 @@ func TestFleetApplyDefaults(t *testing.T) {

// gate
assert.EqualValues(t, "", f.Spec.Strategy.Type)
assert.EqualValues(t, "", f.Spec.Scheduling)

f.ApplyDefaults()
assert.Equal(t, appsv1.RollingUpdateDeploymentStrategyType, f.Spec.Strategy.Type)
assert.Equal(t, "25%", f.Spec.Strategy.RollingUpdate.MaxUnavailable.String())
assert.Equal(t, "25%", f.Spec.Strategy.RollingUpdate.MaxSurge.String())
assert.Equal(t, Packed, f.Spec.Scheduling)
}

func TestFleetUpperBoundReplicas(t *testing.T) {
Expand Down
34 changes: 17 additions & 17 deletions pkg/fleetallocation/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import (
"sync"

"agones.dev/agones/pkg/apis/stable"
stablev1alpha1 "agones.dev/agones/pkg/apis/stable/v1alpha1"
"agones.dev/agones/pkg/apis/stable/v1alpha1"
"agones.dev/agones/pkg/client/clientset/versioned"
getterv1alpha1 "agones.dev/agones/pkg/client/clientset/versioned/typed/stable/v1alpha1"
"agones.dev/agones/pkg/client/informers/externalversions"
Expand Down Expand Up @@ -95,7 +95,7 @@ func NewController(
eventBroadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: kubeClient.CoreV1().Events("")})
c.recorder = eventBroadcaster.NewRecorder(scheme.Scheme, corev1.EventSource{Component: "fleetallocation-controller"})

kind := stablev1alpha1.Kind("FleetAllocation")
kind := v1alpha1.Kind("FleetAllocation")
wh.AddHandler("/mutate", kind, admv1beta1.Create, c.creationMutationHandler)
wh.AddHandler("/validate", kind, admv1beta1.Create, c.creationValidationHandler)
wh.AddHandler("/validate", kind, admv1beta1.Update, c.mutationValidationHandler)
Expand All @@ -120,7 +120,7 @@ func (c *Controller) Run(workers int, stop <-chan struct{}) error {
func (c *Controller) creationMutationHandler(review admv1beta1.AdmissionReview) (admv1beta1.AdmissionReview, error) {
c.logger.WithField("review", review).Info("creationMutationHandler")
obj := review.Request.Object
fa := &stablev1alpha1.FleetAllocation{}
fa := &v1alpha1.FleetAllocation{}

err := json.Unmarshal(obj.Raw, fa)
if err != nil {
Expand Down Expand Up @@ -157,10 +157,10 @@ func (c *Controller) creationMutationHandler(review admv1beta1.AdmissionReview)
}

// When a GameServer is deleted, the FleetAllocation should go with it
ref := metav1.NewControllerRef(gs, stablev1alpha1.SchemeGroupVersion.WithKind("GameServer"))
ref := metav1.NewControllerRef(gs, v1alpha1.SchemeGroupVersion.WithKind("GameServer"))
fa.ObjectMeta.OwnerReferences = append(fa.ObjectMeta.OwnerReferences, *ref)

fa.Status = stablev1alpha1.FleetAllocationStatus{GameServer: gs}
fa.Status = v1alpha1.FleetAllocationStatus{GameServer: gs}

newFA, err := json.Marshal(fa)
if err != nil {
Expand Down Expand Up @@ -191,7 +191,7 @@ func (c *Controller) creationMutationHandler(review admv1beta1.AdmissionReview)
func (c *Controller) creationValidationHandler(review admv1beta1.AdmissionReview) (admv1beta1.AdmissionReview, error) {
c.logger.WithField("review", review).Info("creationValidationHandler")
obj := review.Request.Object
fa := &stablev1alpha1.FleetAllocation{}
fa := &v1alpha1.FleetAllocation{}
if err := json.Unmarshal(obj.Raw, fa); err != nil {
return review, errors.Wrapf(err, "error unmarshalling original FleetAllocation json: %s", obj.Raw)
}
Expand Down Expand Up @@ -225,8 +225,8 @@ func (c *Controller) creationValidationHandler(review admv1beta1.AdmissionReview
func (c *Controller) mutationValidationHandler(review admv1beta1.AdmissionReview) (admv1beta1.AdmissionReview, error) {
c.logger.WithField("review", review).Info("mutationValidationHandler")

newFA := &stablev1alpha1.FleetAllocation{}
oldFA := &stablev1alpha1.FleetAllocation{}
newFA := &v1alpha1.FleetAllocation{}
oldFA := &v1alpha1.FleetAllocation{}

if err := json.Unmarshal(review.Request.Object.Raw, newFA); err != nil {
return review, errors.Wrapf(err, "error unmarshalling new FleetAllocation json: %s", review.Request.Object.Raw)
Expand Down Expand Up @@ -256,8 +256,8 @@ func (c *Controller) mutationValidationHandler(review admv1beta1.AdmissionReview
}

// allocate allocated a GameServer from a given Fleet
func (c *Controller) allocate(f *stablev1alpha1.Fleet, fam *stablev1alpha1.FleetAllocationMeta) (*stablev1alpha1.GameServer, error) {
var allocation *stablev1alpha1.GameServer
func (c *Controller) allocate(f *v1alpha1.Fleet, fam *v1alpha1.FleetAllocationMeta) (*v1alpha1.GameServer, error) {
var allocation *v1alpha1.GameServer
// can only allocate one at a time, as we don't want two separate processes
// trying to allocate the same GameServer to different clients
c.allocationMutex.Lock()
Expand All @@ -272,19 +272,19 @@ func (c *Controller) allocate(f *stablev1alpha1.Fleet, fam *stablev1alpha1.Fleet
return allocation, err
}

for _, gs := range gsList {
if gs.Status.State == stablev1alpha1.Ready && gs.ObjectMeta.DeletionTimestamp.IsZero() {
allocation = gs
break
}
switch f.Spec.Scheduling {
case v1alpha1.Packed:
allocation = findReadyGameServerForAllocation(gsList, packedComparator)
case v1alpha1.Distributed:
allocation = findReadyGameServerForAllocation(gsList, distributedComparator)
}

if allocation == nil {
return allocation, ErrNoGameServerReady
}

gsCopy := allocation.DeepCopy()
gsCopy.Status.State = stablev1alpha1.Allocated
gsCopy.Status.State = v1alpha1.Allocated

if fam != nil {
c.patchMetadata(gsCopy, fam)
Expand All @@ -300,7 +300,7 @@ func (c *Controller) allocate(f *stablev1alpha1.Fleet, fam *stablev1alpha1.Fleet
}

// patch the labels and annotations of an allocated GameServer with metadata from a FleetAllocation
func (c *Controller) patchMetadata(gs *stablev1alpha1.GameServer, fam *stablev1alpha1.FleetAllocationMeta) {
func (c *Controller) patchMetadata(gs *v1alpha1.GameServer, fam *v1alpha1.FleetAllocationMeta) {
// patch ObjectMeta labels
if fam.Labels != nil {
if gs.ObjectMeta.Labels == nil {
Expand Down
95 changes: 95 additions & 0 deletions pkg/fleetallocation/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,8 @@ func TestControllerMutationValidationHandler(t *testing.T) {
}

func TestControllerAllocate(t *testing.T) {
t.Parallel()

f, gsSet, gsList := defaultFixtures(4)
c, m := newFakeController()
n := metav1.Now()
Expand Down Expand Up @@ -210,6 +212,98 @@ func TestControllerAllocate(t *testing.T) {
assert.False(t, updated)
}

func TestControllerAllocatePriority(t *testing.T) {
t.Parallel()

n1 := "node1"
n2 := "node2"

run := func(t *testing.T, name string, test func(t *testing.T, c *Controller, fleet *v1alpha1.Fleet)) {
f, gsSet, gsList := defaultFixtures(4)
c, m := newFakeController()

gsList[0].Status.NodeName = n1
gsList[1].Status.NodeName = n2
gsList[2].Status.NodeName = n1
gsList[3].Status.NodeName = n1

m.AgonesClient.AddReactor("list", "fleets", func(action k8stesting.Action) (bool, runtime.Object, error) {
return true, &v1alpha1.FleetList{Items: []v1alpha1.Fleet{*f}}, nil
})
m.AgonesClient.AddReactor("list", "gameserversets", func(action k8stesting.Action) (bool, runtime.Object, error) {
return true, &v1alpha1.GameServerSetList{Items: []v1alpha1.GameServerSet{*gsSet}}, nil
})
m.AgonesClient.AddReactor("list", "gameservers", func(action k8stesting.Action) (bool, runtime.Object, error) {
return true, &v1alpha1.GameServerList{Items: gsList}, nil
})

gsWatch := watch.NewFake()
m.AgonesClient.AddWatchReactor("gameservers", k8stesting.DefaultWatchReactor(gsWatch, nil))
m.AgonesClient.AddReactor("update", "gameservers", func(action k8stesting.Action) (bool, runtime.Object, error) {
ua := action.(k8stesting.UpdateAction)
gs := ua.GetObject().(*v1alpha1.GameServer)
gsWatch.Modify(gs)
return true, gs, nil
})

_, cancel := agtesting.StartInformers(m)
defer cancel()

t.Run(name, func(t *testing.T) {
test(t, c, f)
})
}

run(t, "packed", func(t *testing.T, c *Controller, f *v1alpha1.Fleet) {
// priority should be node1, then node2
gs, err := c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n2, gs.Status.NodeName)

// should have none left
_, err = c.allocate(f, nil)
assert.NotNil(t, err)
})

run(t, "distributed", func(t *testing.T, c *Controller, f *v1alpha1.Fleet) {
// make a copy, to avoid the race check
f = f.DeepCopy()
f.Spec.Scheduling = v1alpha1.Distributed
// should go node2, then node1
gs, err := c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n2, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

gs, err = c.allocate(f, nil)
assert.Nil(t, err)
assert.Equal(t, n1, gs.Status.NodeName)

// should have none left
_, err = c.allocate(f, nil)
assert.NotNil(t, err)
})
}

func TestControllerAllocateMutex(t *testing.T) {
t.Parallel()

Expand Down Expand Up @@ -270,6 +364,7 @@ func defaultFixtures(gsLen int) (*v1alpha1.Fleet, *v1alpha1.GameServerSet, []v1a
Template: v1alpha1.GameServerTemplateSpec{},
},
}
f.ApplyDefaults()
gsSet := f.GameServerSet()
gsSet.ObjectMeta.Name = "gsSet1"
var gsList []v1alpha1.GameServer
Expand Down
Loading

0 comments on commit 4cbaf3e

Please sign in to comment.