Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health check for gameserver-sidecar. #44

Merged
merged 1 commit into from
Jan 8, 2018
Merged

Add health check for gameserver-sidecar. #44

merged 1 commit into from
Jan 8, 2018

Conversation

dzlier-gcp
Copy link
Contributor

Adding health check liveness probe to sidecar (#12).

Since sidecars for the GameServers aren't created in yaml config, I added the configuration to the controller startup.

Tested by following these instructions on a Linux machine and creating a test pod. However, I do not see any sidecar resources listed in kubectl get all, much less a test to see if the health check is working. Output from the command is below. Please let me know how to confirm that the sidecar and health check are working as intended.

kubectl get all output:

NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/gameservers-controller   1         1         1            1           50m

NAME                                   DESIRED   CURRENT   READY     AGE
rs/gameservers-controller-3664581478   1         1         1         50m

NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/gameservers-controller   1         1         1            1           50m

NAME                                         READY     STATUS    RESTARTS   AGE
po/gameservers-controller-3664581478-8d631   1/1       Running   0          6m

kubectl describe po/gameservers-controller-3664581478-8d631 output:

Name:           gameservers-controller-3664581478-8d631                                                                                                                                 
Namespace:      default                                                                                                                                                                 
Node:           gke-test-cluster-default-434a1e12-jg05/10.138.0.3                                                                                                                       
Start Time:     Wed, 03 Jan 2018 23:30:05 +0000                                                                                                                                         
Labels:         pod-template-hash=3664581478                                                                                                                                            
				stable.agon.io/role=controller                                                                                                                                          
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"gameservers-controller-3664581478","uid":"e7c43b8f-f0d7-11e7-aed...                                                                                                                                                     
				kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container gameservers-controller                                                                     
Status:         Running                                                                                                                                                                 
IP:             10.48.1.7                                                                                                                                                               
Created By:     ReplicaSet/gameservers-controller-3664581478                                                                                                                            
Controlled By:  ReplicaSet/gameservers-controller-3664581478                                                                                                                            
Containers:                                                                                                                                                                             
  gameservers-controller:                                                                                                                                                               
	Container ID:   docker://8b6e51381a6ca627df573beaf4535516f193ab438520b624b2db01c2565b2e45                                                                                           
	Image:          gcr.io/dzlier-work/gameservers-controller:0.1-80b475e                                                                                                               
	Image ID:       docker-pullable://gcr.io/dzlier-work/gameservers-controller@sha256:530f4c226fbff314de58fb82b6b9afb0adb34d7709ea726d0786e1ad858e18d6
	Port:           <none>
	State:          Running
	  Started:      Wed, 03 Jan 2018 23:30:07 +0000
	Ready:          True
	Restart Count:  0
	Requests:
	  cpu:     100m
	Liveness:  http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
	Environment:
	  ALWAYS_PULL_SIDECAR:  true
	  SIDECAR:              gcr.io/dzlier-work/gameservers-sidecar:0.1-80b475e
	  MIN_PORT:             7000
	  MAX_PORT:             8000
	Mounts:
	  /var/run/secrets/kubernetes.io/serviceaccount from default-token-pcdw1 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  default-token-pcdw1:
	Type:        Secret (a volume populated by a Secret)
	SecretName:  default-token-pcdw1
	Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
				 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason                 Age   From                                             Message
  ----    ------                 ----  ----                                             -------
  Normal  Scheduled              6m    default-scheduler                                Successfully assigned gameservers-controller-3664581478-8d631 to gke-test-cluster-default-434a1e12-jg05
  Normal  SuccessfulMountVolume  6m    kubelet, gke-test-cluster-default-434a1e12-jg05  MountVolume.SetUp succeeded for volume "default-token-pcdw1"
  Normal  Pulling                6m    kubelet, gke-test-cluster-default-434a1e12-jg05  pulling image "gcr.io/dzlier-work/gameservers-controller:0.1-80b475e"
  Normal  Pulled                 6m    kubelet, gke-test-cluster-default-434a1e12-jg05  Successfully pulled image "gcr.io/dzlier-work/gameservers-controller:0.1-80b475e"
  Normal  Created                6m    kubelet, gke-test-cluster-default-434a1e12-jg05  Created container
  Normal  Started                6m    kubelet, gke-test-cluster-default-434a1e12-jg05  Started container

@markmandel markmandel self-requested a review January 4, 2018 00:47
@markmandel markmandel added the kind/feature New features for Agones label Jan 4, 2018
@markmandel
Copy link
Member

So this is the sidecar for a GameServer, which means that to have a look and see if the health check is configured (and working) - you will need to deploy one of the example gameservers, and then if you either kubectl describe gamesever or kubectl get gameserver <name> -o yaml, you will be able to see the details.

Assuming you have go and docker installed locally - the easiest is likely the example here:
https://github.com/googleprivate/agon/blob/master/examples/simple-udp/Makefile

Run make build and make build-image to build the docker image, and then push that image up to your gcr repository.

To deploy, there is a gameserver.yaml.
I would suggest jumping into make shell for this part, since kubectl is installed and authenticated in there.

cd into the above directory, and kubectl apply -f gameserver.yaml to install the gameserver - you should not be able to do kubectl describe gameservers simple-udp and see if the health check is configured, and the health of the pods that were created.

Clearly, this would be good if it was documented better!

Let me know if you have more questions.

@markmandel markmandel added this to the 0.1 milestone Jan 4, 2018
@dzlier-gcp
Copy link
Contributor Author

OK thanks, I had tried starting up that example game server, but I guess I didn't know the exact right commands. The docs in the example don't actually say how to launch an instance, so maybe I can update that when I am done.

@markmandel
Copy link
Member

Added #45 to track writing gameserver documentation.

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took this for a spin today, overall looks pretty good, just a couple of things to look at that I found.

Also, spun up a gameserver, and the healthcheck works! 👍

You can see it in the describe:

  agon-gameserver-sidecar:
    Image:          gcr.io/agon-images/gameservers-sidecar:0.1-80b475e
    Port:           <none>
    State:          Running
      Started:      Sun, 07 Jan 2018 00:48:51 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     100m
    Liveness:  http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3
    Environment:
      GAMESERVER_NAME:  simple-udp
      POD_NAMESPACE:    default (v1:metadata.namespace)

@@ -68,8 +70,13 @@ func main() {
if isLocal {
sdk.RegisterSDKServer(grpcServer, &Local{})
} else {
config, err := rest.InClusterConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is never checked. A great tool to check these type of things, which is also baked into the build image, and can be invoked in the shell is gometalinter

For example:

root@fd84b110807c:/go/src/github.com/agonio/agon# gometalinter ./gameservers/sidecar/ --deadline=1h
gameservers/sidecar/main.go:73:11:warning: ineffectual assignment to err (ineffassign)
gameservers/sidecar/main.go:73:11:warning: this value of err is never used (SA4006) (megacheck)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in case I wasn't clear on this one - we should check the error, and Fatalfif it fails.

It would be bad canonical Go not to process an error 😢

@@ -93,3 +89,58 @@ func TestSidecarRun(t *testing.T) {
})
}
}

func TestHealthCheck(t *testing.T) {
fixtures := map[string]struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need all these fixtures just to test the healthcheck. Couldn't we just run the Sidecar, and then ping the http handler, like we do in the controller example?

Or maybe there's something I'm missing here?

@dzlier-gcp
Copy link
Contributor Author

Thanks for testing it out Mark! I still wasn't able to get a game server working last week so I'm glad to see the SC health check working.

Added a commit simplifying the health check test.

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also get rid of this outside goroutine - I don't think it adds anything to the test. Just keep the inner section go sc.Run() forward.

Then the select down below can also go away as well.

See the controller test:
https://github.com/googleprivate/agon/blob/master/gameservers/controller/controller_test.go#L224-L227

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! (On both the config error check and the test changes)

@dzlier-gcp dzlier-gcp force-pushed the health branch 2 times, most recently from 343acea to 4841b6f Compare January 8, 2018 18:45
@markmandel
Copy link
Member

I have but one more request! Can you rebase it down to a single commit, and then it's LGTM!

(Or if you happy, we can try the "Squash and Merge" button, which I've never tried before, but looks to do exactly the same thing)

@dzlier-gcp
Copy link
Contributor Author

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New features for Agones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants