-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve allocation performance #583
Improve allocation performance #583
Conversation
Build Failed 😱 Build Id: 4e103398-2b4a-45ce-b013-d4222277ccd3 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Removing the mutex has a potential to introduce very subtle bugs (leaks, double allocation, panics), can we add some allocation stress test before making this change? then we can be confident we're not regressing. How about this e2e scenario:
I would perhaps add one more twist: while allocating, we should have game servers update themselves (via SDK.SetLabel()), which will test that we're handling update conflicts in the allocation routine. |
Maybe that is something that can be added here: |
I agree with removing mutex will cause unexpected behaviors/bugs. That's why I am still keeping it but reducing it's scope and blocking it local cache. I did few manual tests (with parallel allocations). I wanted to get quick feedback before spending more time on tests. I will add e2e test(s). |
Build Failed 😱 Build Id: 7fd72495-83db-4989-9710-160a9230771b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: ba933819-1f8f-4cac-bcbf-a91db70c245e The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: b6eda206-7f70-4044-af0f-908eab190666 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 6bbd38ec-f161-416b-807e-aed5db547897 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 22a163d1-2ba1-4c75-805e-d1e1b32f5434 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: c8d1a308-429d-4a89-a5ba-9064a97e1bda The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: b098a310-293a-4cc6-9caa-d351b06e8a79 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Looks like you're still actively working on this PR? Just wanted to touch base and check? |
I made quite a bit progress. I am planning to checkin once I merge my changes. I might have more improvements but I am planning to do them in another PR. |
Okay cool - will hold off doing another review until you give the 👍 Excited to see the improvements! |
Build Failed 😱 Build Id: c0054ed9-6665-452e-a656-daada969ba80 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 85d579e1-c743-470c-b6f7-7eef4dad72a2 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
It will be great if you can review. I will like to check-in this version. I might have further improvements in the second round. |
Build Failed 😱 Build Id: 21145968-e5ff-4848-b565-114b7159c97b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
@markmandel , should we create GSA crd for failed gs allocations (with UnAllocated state)? |
Build Failed 😱 Build Id: d4460919-1c4c-42ca-88cf-13ce2524f4af To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 1639f513-9474-4664-9213-fca987cfbff1 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Succeeded 👏 Build Id: 7f2a9520-4b9b-4203-b231-7390af1fe7e4 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
#600 should solve this issue, as we totally remove storage. |
Agreed. Just want to make sure. I have already removed the Unallocate GSA creation from the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, 💯 this approach. Is awesome.
Had a few questions and thoughts on a couple of things - but this is great.
} | ||
|
||
// findComparator is a comparator function specifically for the | ||
// findReadyGameServerForAllocation method for determining | ||
// scheduling strategy | ||
type findComparator func(bestCount, currentCount NodeCount) bool | ||
|
||
var allocationRetry = wait.Backoff{ | ||
Steps: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the number of retries be configurable? Just curious how we came up with the number 5. (we have 30s to work in -- although with the webhook, it is serial, so maybe less is better until we get #600 or related) 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure making this configurable will be helpful without some guidance. Eventually it can be.
Once everything is stable, we can try different values and find a good number.
I get this from here. I was trying different numbers. I agree that it should not be too high. I can set it to 3 for now.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fair - more than anything else, I just wanted to ask the question. I'm happy with 5 - seems like a legitimate number. I feel like I'd rather aim higher, and then maybe pull it down if need be,
@Kuqd curious - at some point, should we add a metric for this somewhere? How many retries?
Build Failed 😱 Build Id: 02b7541b-7f66-4399-8672-0609fbc433ef To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: cb4d5660-4bfa-48fd-878d-8d7724c5cfe2 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 32745520-1d35-4000-9dff-cc259698c542 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: 62388e8f-e589-4e3d-8380-355d2fabe185 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: a17b55dd-5944-46ab-9896-88a1681e1f6c To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: a74c7057-ad35-42aa-a18d-f5fb15fcfb60 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit things, and a thought on documentation to add.
Build Failed 😱 Build Id: d8a6822e-a514-4c0e-afff-a07596c6a611 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 3fba764a-f0cd-47f2-b265-7b1c50e67986 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 281bdccd-4de1-4dfc-b51d-0c2da4e47049 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
LGTM. Needs to be squashed to a single commit, and then it's good to go! |
…h local cache. Fixes googleforgames#535
b74731e
to
e183529
Compare
Build Succeeded 👏 Build Id: d04ad053-5358-42a5-9433-4747a9aa389f The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Remove the call to sync the gameserver cache during each allocation request and replaced it with using local (ready gameservers) cache.