Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metric - state duration #1468

Merged
merged 11 commits into from
Jun 18, 2020

Conversation

aLekSer
Copy link
Collaborator

@aLekSer aLekSer commented Apr 13, 2020

Add GameServer state duration metric reusing current function recordGameServerStatusChanges()

For #1013
Closes #831

@aLekSer
Copy link
Collaborator Author

aLekSer commented Apr 13, 2020

This approach uses current subscription to gsInformer:

	gsInformer.AddEventHandlerWithResyncPeriod(cache.ResourceEventHandlerFuncs{
		UpdateFunc: c.recordGameServerStatusChanges,
	}, 0)

Need to perform more tests on this approach.
LRU cache was used instead of map because it could grow under certain circumstances indefinitely.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 9b0b8dc4-0e0a-46e2-91c9-c514d9286d8c

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer aLekSer force-pushed the metrics/state-duration branch from 5c65fd6 to 41608bd Compare April 13, 2020 20:07
@markmandel markmandel requested a review from cyriltovena April 13, 2020 20:12
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: f21ef69c-cbe7-41eb-83e9-5617e7987a7f

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel markmandel added the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Apr 13, 2020
pkg/metrics/controller.go Outdated Show resolved Hide resolved
pkg/metrics/controller.go Outdated Show resolved Hide resolved
&view.View{
Name: "gameserver_state_duration",
Measure: gsStateDuration,
Description: "The time gameserver exist in the current state in seconds",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gameserver state duration\lasting (seconds)

pkg/metrics/controller.go Outdated Show resolved Hide resolved
pkg/metrics/controller.go Outdated Show resolved Hide resolved
pkg/metrics/controller.go Outdated Show resolved Hide resolved
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: ed1b20fb-5d6c-41c2-86d5-145e8de25405

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer aLekSer force-pushed the metrics/state-duration branch from d016dc8 to 6753f40 Compare April 14, 2020 12:26
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: e02053b3-be94-4a44-82c0-ec7b56d5712b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/1468/head:pr_1468 && git checkout pr_1468
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.5.0-6753f40

@markmandel markmandel removed the feature-freeze-do-not-merge Only eligible to be merged once we are out of feature freeze (next full release) label Apr 14, 2020
@aLekSer
Copy link
Collaborator Author

aLekSer commented Apr 15, 2020

This is how new metric could look like in Grafana (title was not updated yet), need to add by (type) to get graph for each state, currently all are presented in one.
Screenshot 2020-04-14 at 17 07 40

Also need to skip some incoming timestamps from evicted keys. Could achieve this by switching to {GameState, time} structure for values and namespace, name for keys in the cache.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Apr 15, 2020

There is an issue with Shutdown state, which seems to be inaccurate.
Two examples of charts for different state durations;
Ready State
Creating

@markmandel
Copy link
Member

Does this need review of some kind? I generally don't look at PR's in Draft unless specifically asked to, but saw this one seemed to have been sitting here for some time.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Apr 21, 2020

Will update this and my other PRs today. I switched to another task related to Terraform yesterday.
I wanted to add proper Grafana graph for this PR, that's why I got stuck.

@aLekSer aLekSer force-pushed the metrics/state-duration branch from 6753f40 to 40b6527 Compare April 21, 2020 11:55
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: fc8c0619-27b4-428e-a2de-e26b29c1f08c

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/1468/head:pr_1468 && git checkout pr_1468
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.6.0-40b6527

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 8, 2020

@cyriltovena Cyril, I think I have applied all your comments. Can you please re-review?

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 407af734-0699-46b6-b79f-79f94f46288f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/1468/head:pr_1468 && git checkout pr_1468
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.7.0-f400f53

@markmandel
Copy link
Member

Just checking in on this. See how it's doing?

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 16, 2020

This version was finalised. @cyriltovena one more bump on this.

@@ -21,6 +21,7 @@ import (
)

var (
distributionSeconds = []float64{0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use a different name or not use a variable.

I think this boundaries only fits your metrics.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update I would rename it to stateDurationSeconds.

Copy link
Collaborator

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aLekSer, cyriltovena

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

aLekSer added 11 commits June 16, 2020 23:43
It needs to switch from map to LRU cache to limit the memory usage.
State in Key could be used to test that we are not skipping some
particular State changes.
Refactored test for calcDuration - table tests are used.
Fix linter issues. Now we have namespace/fleet/gameserver/state as a
key.
Added description on cache size.
@aLekSer aLekSer force-pushed the metrics/state-duration branch from 9d0507f to b262023 Compare June 16, 2020 20:43
@aLekSer aLekSer merged commit 0c005e7 into googleforgames:master Jun 18, 2020
@markmandel markmandel added this to the 1.7.0 milestone Jun 30, 2020
@markmandel markmandel added the kind/feature New features for Agones label Jun 30, 2020
ilkercelikyilmaz pushed a commit to ilkercelikyilmaz/agones that referenced this pull request Oct 23, 2020
State Duration Metric uses LRU cache to limit the memory usage.

* Add LRU cache to calculate time for state changes

State in Key could be used to test that we are not skipping some
particular State changes.

* Move LRU cache to a controller local field

* Fixed return parameters for calcDuration

Rewrote the test.

* Add negative state duration check

* Return 0 duration with error

Refactored test for calcDuration - table tests are used.

* Add documentation for the new metric

* Added Fleet to LRU key

Fix linter issues. Now we have namespace/fleet/gameserver/state as a
key.

Added description on cache size.

* Add Namespace into metric labels

* Rename of StateDuration boundaries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/operations Installation, updating, metrics etc cla: yes kind/feature New features for Agones size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose GameServer state change metrics
7 participants