Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace global allocation mutex with fine-grained concurrency controls. #535

Closed
jkowalski opened this issue Jan 30, 2019 · 8 comments
Closed
Assignees
Labels
area/performance Anything to do with Agones being slow, or making it go faster. help wanted We would love help on these issues. Please come help us!
Milestone

Comments

@jkowalski
Copy link
Contributor

Currently there's a global mutex shared between 4 controllers that prevents allocations from happening when there's a deletion of (any) game server going on, regardless of a fleet it's on, etc.

To get decent allocation throughput this mutex should be removed and we should start relying on conditional mutations of GameServer itself to ensure correctness.

@jkowalski jkowalski added help wanted We would love help on these issues. Please come help us! area/performance Anything to do with Agones being slow, or making it go faster. labels Jan 30, 2019
@markmandel
Copy link
Member

The biggest issue here is - how to delete safely without accidentally deleting something that is being concurrently allocated.

Or "only delete this object if it's equal to this revision/generation"

@markmandel
Copy link
Member

markmandel commented Jan 30, 2019

Here's a possibly crazy idea - we could reject the deletion with a webhook! The webhook could check if the object is allocated (pretty sure this is 100% up to date), and if it is, reject the deletion.

I think this will work!

Edit: removed idea, as this below is better.

@jkowalski
Copy link
Contributor Author

This can be done by adding something like "Deleting" status and using regular Update:

Instead of deleting, callers would set "Deleting" status (via regular Update which resolves race conditions) and GS controller would be the only one to trigger actual deletion for GS in Deleting state.

@jkowalski
Copy link
Contributor Author

BTW. As soon as we remove the mutex, we will see increased contention on Game Server resources. That can be fixed with batching and randomization of GS to allocate (see #536).

@markmandel
Copy link
Member

markmandel commented Jan 30, 2019

This can be done by adding something like "Deleting" status and using regular Update:

You know what - we already have this functionality, because of the SDK. It's called Shutdown state.
https://github.com/GoogleCloudPlatform/agones/blob/master/pkg/gameservers/controller.go#L619-L636

Good call. I like it 👍

@jkowalski jkowalski assigned jkowalski and unassigned jkowalski Feb 1, 2019
@ilkercelikyilmaz
Copy link
Contributor

I think I can work on this.

@thisisnotapril
Copy link
Collaborator

@jkowalski @ilkercelikyilmaz assignment made!

@ilkercelikyilmaz
Copy link
Contributor

PR #572 fixes this partially. I will continue on the allocation improvements so we can get rid-off the Allocation Mutex

@markmandel markmandel added this to the 0.9.0 milestone Feb 18, 2019
ilkercelikyilmaz added a commit to ilkercelikyilmaz/agones that referenced this issue Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Agones being slow, or making it go faster. help wanted We would love help on these issues. Please come help us!
Projects
None yet
Development

No branches or pull requests

4 participants