-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to preserve currently running ECS Instances during update to the latest AMI #672
Comments
I solved this problem using task-placement-constraints |
@vmogilev can you please tell us how did to manage to do this using task-placement-constraints. The main problem i am facing to while updating the cluster are : |
@hridyeshpant - It took me a day to come up with the solution so I decided to document it before I forget. Hopefully it'll help someone else in the future. Here's the process I came up with: Blue/Green ECS-optimized AMI Update For ECS Instances Hope this helps! It you have any questions - don't hesitate to ask. |
@vmogilev Thank you so much for writing this up! Please let us know if you need any more assistance here. I'm closing this issue for now. |
@vmogilev @hridyeshpant Today we launched Container Instance Draining to address this use case. Once you launch your new instances with the new AMI, you can set the old instances to |
Wow @samuelkarp - you guys rock! Thank you |
great @samuelkarp :) |
@samuelkarp is there way we can place existing task to always new instances (created by new ami) from DRAINING instance. we can't use task-placement-constraints as this will required to create new version of task def and service. we also can't put all old AMI instances (60 at this time) to DRAINING state as this will cause throttling in docker registry. |
@hridyeshpant what are the constraints that are preventing you to create new task definition? |
@vmogilev we are running 20-30 service per instances. so if i need to use constraints. i need to first update all 30 task def and then need to update 30 service to use new task def during cluster update. |
@hridyeshpant I'm not sure I understand your question. When you place an instance into the |
@samuelkarp my use case is : i always want services (running in DRAINING instance) move to new instances (created by new ami) not any instances attached to cluster. let say i am running 20-30 task per instances with placementConstraints |
Is there a particular reason you have the AMI ID in the task definition? If you were just using that as a stop-gap before DRAINING came out, do you still need it? |
@samuelkarp |
Mark your old instances as DRAINING and the ECS service scheduler will move them according to the service deployment configuration parameters. |
@samuelkarp but just marking old instances as DRAINING , ecs can put those services in any running instances, which can be old instances with old ami. |
Maybe we're both speaking past each other?
We have a blog post about how to do this. |
@samuelkarp so when we say Tasks move off the DRAINING instances,
|
Any of the non-DRAINING instances. |
that is no my use case, i aways want to deploy in newly launch instances (ami-id -5678) . that what i mentioned earlier about my use case and using combination of combination of DRAINING feature with task-placement-constraints |
I think I'm misunderstanding your use-case then. It's probably worth opening a new issue if you want to go into this in more depth. |
yeah may be i was not clear. |
@hridyeshpant If I understand your use-case: you have a large number of instances in a cluster. You'd like to replace all the instances in the cluster over a period of time rather than all at once, and you'd like to ensure that tasks only get moved once rather than potentially moving more than once due to starting on instances that you're going to get rid of anyway. I think you can accomplish this by using Step 1: Add a custom constraint to all task definitions to not place tasks on instances with a custom attribute "placementConstraints": [
{
"expression": "attribute:state!=pre-drain",
"type": "memberOf"
}
] Step 2: Launch the instances with the new AMI in your cluster. Step 3: Set the custom attribute Step 4: Set X% of the old instances to the Step 5: Continue to increase the percentage of instances in the |
@samuelkarp thanks a lot, this will definitely work. |
Updating this thread to say this approach does work well.
|
I am in process of testing the update to the latest Amazon ECS-optimized AMI (amzn-ami-2016.09.d-amazon-ecs-optimized).
Our current ECS Instances are running amzn-ami-2015.09.g-amazon-ecs-optimized which at the time of the launch pulled the following stack:
Docker: 1.9.1
ECS Agent: 1.8.2
I don't think it's a good idea to simply update launch configuration with the new AMI and hope for the best. What if things fail under load, what if we discover a bug with the new AMI/Docker/Agent combo running our containers? These are all possibilities and we need to mitigate the risks by preserving our old instances while the new instances are burning-in under production load. Once we feel solid - we can terminate the old instances.
I can't figure out how to do this. Here's what I tried:
I updated the launch configuration for the Auto Scaling Group and doubled the number of instances in it. End result I have 4 instances with OLD AMI and 4 instances with NEW AMI. Good!
I then updated the ECS Service and increased it's number of Tasks from 4 to 8. End result 4 new tasks were started on the NEW AMI Instances and 4 original tasks are still running on the OLD AMI Instances. Good!
All good at this point. Next I need to stop the tasks on the 4 OLD AMI Instances and somehow keep these OLD AMI Instances in reserve while we burn in the 4 NEW AMI Instances. Here's what I tried:
I set the Status for the 4 OLD AMI Instance "Standby" (in ASG). I was expecting ECS AGENT on these OLD AMI Instances to terminate all running ECS Tasks. No dice!
I then reduced ECS Service task number from 8 to 4 hoping that ECS Agent will terminate the Tasks on the OLD AMI Instances. No dice! It terminated TASKS on random instances mixing NEW/OLD in the process.
I then decided to help ECS Agent and manually (one at a time) stopped running TASKS on the OLD AMI Instances hoping that ECS Agent will NOT re-launch the TASKS on the OLD AMI Instances. No dice -- it still managed to launch some tasks on the OLD AMI Instances.
At this point I am lost. Is this even possible?
One option I am considering is using task-placement-constraints, but I am hoping someone here has gone through this basic need and can share their ideas with me.
I feel we should have a way to mark ECS Instances as StandBy and have the ECS Agent not schedule any tasks on them for as long as that status is active. I don't think "Deregister" functionality is sufficient here because there is no way that I know of to bring deregistered instances back into service.
I also don't like that a specific version of Docker/ECS Agent is not pinned to a specific version of Amazon ECS-optimized AMI. If it were - this would not be an issue, I could always bring back a known, good working set of versions into service. But as it is now - even if I used an older AMI - it will pull in the most recent version of ECS Agent and Docker on launch.
Thank you!
The text was updated successfully, but these errors were encountered: