Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with "software version consistency" feature #2394

Closed
gilad-yadgar opened this issue Jul 2, 2024 · 54 comments
Closed

Issues with "software version consistency" feature #2394

gilad-yadgar opened this issue Jul 2, 2024 · 54 comments
Assignees
Labels
ECS Amazon Elastic Container Service Shipped This feature request was delivered.

Comments

@gilad-yadgar
Copy link

gilad-yadgar commented Jul 2, 2024

EDIT: this is related to the "software version consistency" feature launch, see What's New post: https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-ecs-software-version-consistency-containerized-applications/

Summary

since our EC2 upgraded to ecs-agent v1.83.0, images used for containers are with sha digest and not image tag

Description

we started getting different image value for the '{{.Config.Image}}' property using docker inspect in our ECS EC2.
we are getting sha digest as the .Config.Image instead of getting the image tag.
the task definition contains the correct image tag (and not the digest)

we need the image tag since we rely on that custom tag to understand what was deployed. what can be done?

Expected Behavior

we expect to see the image tag used for the container

Observed Behavior

we get image digest used for the container

Environment Details

@dg-nvm
Copy link

dg-nvm commented Jul 2, 2024

same

@scott-vh
Copy link

scott-vh commented Jul 2, 2024

FWIW today I've encountered a production incident after updating to ecs-agent 1.83.0 roughly 2 weeks ago where I saw a subset of our ECS tasks fail to start with:

CannotPullContainerError: failed to tag image '012345678910.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>@sha256:<digest>' 

This was a surprising error to see given that the only change on our side that we can attribute to this is our agent version upgrade 🤷 and it feels similar enough to be worth a mention given the digest in the error message.

This seemed to be isolated to a small fraction of our cluster instances (all running 1.83.0) and tasks from the same task revisions yielding the error eventually phased in without intervention.


I've also happened to notice that aws/amazon-ecs-agent#4181 intends to help augment these kinds of errors with some more useful context and made it into agent release 1.84.0 so I'll report back if/when we upgrade and whether or not that yields anything of use 👍

EDIT: didn't touch the 1.84.0 upgrade after seeing this comment

@tomdaly
Copy link

tomdaly commented Jul 3, 2024

this has also caused production issues for my org. we use the ImageName value available in the ECS container metadata file at runtime, as we tag our ECR images with the Git commit SHA. this is then used for a variety of things in different services such as sourcing assets, tracking deploys, etc.

since 1.83.0 ImageName is sometimes present as the SHA digest instead of the image ID, which we expected to be within ImageID and not ImageName.

@panny-P
Copy link

panny-P commented Jul 4, 2024

I still found this error on ecs-agent 1.84.0.

@mvanholsteijn
Copy link

mvanholsteijn commented Jul 5, 2024

We have production issues with the change too, when the tag is re-used for a new image layer and the old image is deleted.

@timdarbydotnet
Copy link

timdarbydotnet commented Jul 8, 2024

I'm also seeing the issue where a newly pushed and tagged "latest" image is being ignored and the agent will only use the older untagged instance. This needs to be fixed ASAP or at least give us a workaround. I'm seeing this behavior on agent 1.83.0. This was not happening on 1.82.1.

@turacma
Copy link

turacma commented Jul 8, 2024

We are also seeing this issue in our environment. It doesn't seem to happen with all images. FWIW, on the same container instance, we can see some containers with tags and others without and if a container is one with tags, it's the first launched container.

@turacma
Copy link

turacma commented Jul 8, 2024

FWIW, this also impacts the ECS APIs, specifically describe-tasks

https://www.reddit.com/r/aws/comments/1dtgc4b/mismatching_image_uris_tag_vs_sha256_in_listtasks/

Unclear if the source of truth (and the root cause) is the agent or the APIs themselves, but just though it's worth noting this.

@joelcox22
Copy link

Found this issue after internal investigation of an incident that seems likely related to this. If it helps anyone else, here's my analysis of how this impacted a service that was referencing an ECR image based on a persistent image tag that we were regularly rebuilding and overwriting, and had automation in place for deleting the older untagged images

I have an open support case with AWS to confirm this behaviour, and have included a link to this github issue.

sequenceDiagram
participant jenkins as Jenkins
participant cloudformation as Cloudformation
participant ecs-service as ECS Service
participant ec2-instances as EC2 Instances
participant ecr-registry as ECR Registry
participant docker-base-images as Docker Base Images<br />firelens sidecar image
participant ecr-lifecycle-policy as ECR Lifecycle Policy
jenkins ->> cloudformation: regular deployment
cloudformation ->> ecs-service: creates a new "deployment" for the service
activate ecs-service
note right of ecs-service: ECS resolves the image hash<br />at time of "deployment" creation
ecs-service ->> ec2-instances: starts tasks with resolved image hashes
ec2-instances ->> ecr-registry: pulls latest image from ECR
docker-base-images ->> ecr-registry: rebuid and push image regularly
ecr-lifecycle-policy ->> ecr-registry: deletes older images periodically
note right of ecs-service: periodically, new tasks need to start
ecs-service ->> ec2-instances: starts tasks with previously resolved image hashes
ec2-instances ->> ecr-registry: attempts to run the same image hash from earlier<br />if the image already exists on the instance, its fine<br />otherwise, it needs to pull from ECR again and may fail
ec2-instances ->> ecs-service: tasks fail to launch due to missing image
note right of ecs-service: at this point, the service is unstable<br />might have existing running tasks<br /> but it can't launch new ones
create actor incident as Incident responders
ecs-service ->> incident: begin investigation
note left of incident: "didn't this happen the other day<br />for another service?" *checks slack*
note left of incident: Yeah, it did happen, and the outcome<br />was that we disabled the ECR lifecycle<br />policy, but services were left with<br />the potential to fail when tasks cycle
incident ->> jenkins: trigger replay of latest production deployment early and hope that fixes the issue
jenkins ->> cloudformation: deploy
cloudformation ->> incident: "there are no changes in the template"
incident ->> jenkins: disable the sidecar to get the service up and running again quickly and buy more time for investigation
jenkins ->> cloudformation: deploy with sidecar disabled
deactivate ecs-service
cloudformation ->> ecs-service: create new deployment without sidecar
activate ecs-service
note right of ecs-service: no longer cares about firelens sidecar image
ecs-service ->> ec2-instances: starts new tasks
ec2-instances ->> ecs-service: success
ecs-service ->> incident: service is up and running again, everyone is happy
note left of incident: "but we're not done yet"
incident ->> jenkins: re-enable the sidecar
jenkins ->> cloudformation: deploy with sidecar enabled
deactivate ecs-service
cloudformation ->> ecs-service: create new deployment with sidecar
activate ecs-service
note right of ecs-service: ECS resolves the image hash<br />at time of "deployment" creation
ecs-service ->> ec2-instances: start new tasks
ec2-instances ->> ecr-registry: pulls new images with updated hash
ec2-instances ->> ecs-service: success
ecs-service ->> incident: service is stable again
note left of incident: This service looks good again now<br />but other services might still have a problem
deactivate ecs-service
incident ->> ecs-service: work through "Force New Deployment" for all services in all ecs clusters & accounts
note left of incident: all services are now expected to be<br />stable, as everything should be<br />referencing the latest firelens image<br />hash, and the lifecycle policy<br />to delete older ones is disabled
Loading

@L3n41c
Copy link

L3n41c commented Jul 11, 2024

This issue most probably comes from aws/amazon-ecs-agent#4177 merged in 1.83.0:

Expedited reporting of container image manifest digests to ECS backend. This change makes Agent resolve container image manifest digests for container images prior to image pulls by either calling image registries or inspecting local images depending on the host state and Agent configuration. Resolved digests will be reported to ECS backend using an additional SubmitTaskStateChange API call

@sjmisterm
Copy link

Downgrading to 1.82.4 in our case does not make the issue go away, indicating that, even if it was related to the agent, the digest information is now somehow cached by ECS. We are currently using a DAEMON ECS service.

According to a recent case opened with AWS support, "ECS now tracks the digest of each image for every service deployment of an ECS service revision. This allows ECS to ensure that for every task used in the service, either in the initial deployment, or later as part of a scale-up operation, the exact same set of container images are used." They added this is part of a rollout that started in the last few days of June and is supposed to complete by Monday.

Their suggested solution is to update the ECS service with "Force new deployment" to "invalidate" the cache. If you have AWS support, try to open a case including this information to see how they evaluate your issue.

@joelcox22
Copy link

joelcox22 commented Jul 11, 2024

I got a similar response to @sjmisterm in my support case, confirming the new behaviour is expected, and stating that we should no longer delete the images from ECR until we're certain that the images are no longer in use by any deployment.

This change effectively means ECR lifecycle policies to delete untagged images are expected to cause outages unless additional steps are taken immediately after every time an image is deleted to ensure every deployment that was referencing a mutable tag is redeployed. This is particularly problemattic for my specific use-case where we were referencing a mutable tag for a sidecar container that we include for many services.

I've asked if there is any future roadmap plans to make this use-case easier to manage, and requested for a comment from AWS on this github issue 😄

... https://xkcd.com/1172/

@sjmisterm
Copy link

AWS has confirmed this is definitely caused by them and they think this is a good feature, as the links (made available yesterday) show

https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-ecs-software-version-consistency-containerized-applications/
https://aws.amazon.com/blogs/containers/announcing-software-version-consistency-for-amazon-ecs-services/
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-ecs.html#deployment-container-image-stability

There's no way to turn off this new behaviour, which completely breaks the easiest workflow for blue-green deployments - I'm sure tons of people have other cases that need or benefit from the old one.

I suggest all who have AWS support to file a case and to request an API to turn this off by service / cluster / account.

@amogh09
Copy link

amogh09 commented Jul 12, 2024

Hello. I am from AWS ECS Agent team.

As shared by @sjmisterm above, the behavior change that customers are seeing is because of the recently released Software Version Consistency feature. The feature guarantees that same images are used for a service deployment by recording image manifest digests reported by the first launched task and then overriding tags with digests for all subsequent tasks of the service deployment.

Currently there is no way to turn off this feature. ECS Agent v1.83.0 included a change to expedite the reporting of image manifest digests but older Agent versions also report digests and ECS backend will override tags with digests in both cases. We are actively working on solutions to fix the regressions our customers are facing due to this feature.

@amogh09
Copy link

amogh09 commented Jul 12, 2024

One of the patches we are considering is - instead of overriding :tag with @sha256:digest, we would override it with :tag@sha256:digest so that the lost information is added back to the image references.

@sjmisterm
Copy link

@amogh09 , I can't see how this would address the blue-green scenario. Could you explain it, please?

@amogh09
Copy link

amogh09 commented Jul 12, 2024

There's no way to turn off this new behaviour, which completely breaks the easiest workflow for blue-green deployments

@sjmisterm Can you please share more details on how this change is breaking blue-green deployments for you?

@sjmisterm
Copy link

@amogh09 , sure.

Our blue-green deployments work by deploying a new image to the ECR repo tagged with latest and then launching a new EC2 instance (from the ECS-optimized image, properly configured for the cluster) while we make sure the new version works as expected in production. Then, we start to progressively drain the old tasks until only new tasks are available.

@sjmisterm
Copy link

@amogh09 in summary: the software version "inconsistency" is what makes blue green a breeze with ECS. Should we want consistency, we'd use a digest or a version tag.

@amogh09
Copy link

amogh09 commented Jul 12, 2024

@sjmisterm Deployment unit for an ECS service is a TaskSet. The software version consistency feature guarantees image consistency at TaskSet level. In your case, how do you get a new task to be placed to the new EC2 instance? The new task needs to be a part of a new TaskSet to get the newer image version. If it belongs to the existing TaskSet then it will use the same image version as its TaskSet.

ECS supports blue-green deployments natively at service level if the service is behind an Application Load Balancer. You can also use External deployment type for an even greater control over the deployment process. Software Version Consistency feature is compatible with both of these.

@timdarbydotnet
Copy link

timdarbydotnet commented Jul 12, 2024

@amogh09 I use a network load balancer and the LDAP container instances I'm running will not respond well to this new model. If I can't maintain the ability to pull the tagged latest image, I will have to stop using ECS and manage my own EC2s, which would be painful frankly.

Looking at the ECS API, what would happen if I called DeregisterTaskDefinition and then RegisterTaskDefinition. Would that have the effect of forcing ECS to resolve the digest from the new latest image without killing the running tasks?

@sjmisterm
Copy link

@amogh09 , I think we're talking about different things. Until the ECS change, launching properly a new ECS instance properly configured for a ECS daemon service whose taskdef is tagged with :latest would launch the new task with, well, the image tagged latest. Now it launches it using the digest resolved by the first task unless you force a new deployment in your service.

Our deployment scripts pre-dates CodeDeploy and the other features. So all your suggestions require rewriting deployment code because of a feature we can't simply opt-out.

@amogh09
Copy link

amogh09 commented Jul 12, 2024

I understand the frustrations you all are sharing regarding this change. I request you to contact AWS Support for your issues. Our support team will be able to assist you with workarounds relevant to your specific setups.

@sjmisterm
Copy link

@amogh09 , a simple API flag in the service / cluster / region / account would solve the problem. That's what we're trying to get across because it disturbs your customer base - not everyone pays support and the old behaviour, as you can see, is used by several of them.

@mpoindexter
Copy link

I'll chime in that we were negatively impacted by this change as well, and I don't think it helps anything for most scenarios.

Before, customers effectively had a choice: they could either enforce software version consistency by using immutable tags (https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-tag-mutability.html), or if they wanted to allow for a rolling release (most useful for daemon services as @sjmisterm alluded to) they could achieve that as well by using a mutable tag.

Now, this option is gone with nothing to replace it, and very poor notification that it was going to happen to boot.

@timdarbydotnet
Copy link

I'm very disappointed with AWS on two counts:

  • From a technical standpoint, it appears that no consideration was given to how customers are actually using ECS.
  • The lack of prior communication for a change like this is shocking. I've seen AWS announce long lead times for far less impactful changes than this.

@scott-vh
Copy link

I know that the circumstances around how we all got notified about this change aren't ideal, but is there anywhere where we can be proactive and follow along for similar updates that may affect us in the future? Did folks get a mention from their AWS technical account managers or similar?

I lurk around the containers roadmap fairly often, but don't see an issue/mention there or in any other publicly-facing aws github project around this feature release.

@dg-nvm
Copy link

dg-nvm commented Jul 16, 2024

@scott-vh the problem is that this is an internal API change, ECS backend behaves differently now. This has nothing to do with ecs-agent itself, regardless of version you will get same behaviour. Noone could see it coming

@scott-vh
Copy link

@dg-nvm Yep I got that 👍 I was just curious if there was any breadcrumb anywhere else for which we could've seen this coming (sounds like no, but wanted to see if anyone who interfaces with TAMs or ECS engineers through higher tiers of support got some notice)

@dg-nvm
Copy link

dg-nvm commented Jul 16, 2024

@scott-vh our TAM was informed about the problem but idk if there was any proposal. Given that I see ideas for workarounds accumulating I would say no :D Luckily our CD was not impacted by this, I can think of scenario that deamons deployments is easier using mutable tags, especially that ECS does not play nicely when replacing daemons. Sometimes they are stuck because they got removed from the host and something else was put in it's place in the meantime :)

@sparrc
Copy link

sparrc commented Jul 19, 2024

Hi all, I have transferred this issue into the containers-roadmap repo. As far as I understand it, people are experiencing issues with this feature as a whole, rather than an issue with the ECS agent behavior specifically. For reference, see what's new post: https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-ecs-software-version-consistency-containerized-applications/

Please feel free to continue adding your +1 and providing feedback :)

@simonulbrich
Copy link

This issue is affecting us as well. we utilize an initialization container that runs before the app container. this init container sets up monitoring integrations and settings that are not critical to the app itself but with a limited team we rely on the mutable tags to handle the "rolling" update as tasks are restarted. to force an application deployment for each application that my team manages for these small config updates would be an impossible task. Is there any way at all to prevent this "consistency" feature for a single container or disable entirely at the task level?

It seems like this feature was already solved with tag immutability, giving us the option to use mutable tags if we actually needed that behavior.

@acdha
Copy link

acdha commented Aug 1, 2024

This regression caused a minor production outage for us because AWS' monitoring tools like X-Ray recommend using mutable tags, which means that if any of those has a release outside of your deployment cycle you are now set up to have all future tasks fail to deploy because you followed the AWS documentation:

I think this feature was a mistake and should be reverted – there are better ways to accomplish that goal which do not break previously-stable systems and immutable tags are not suitable for every situation, as evidenced by the way the AWS teams above are using them – but if the goal is to get rid of mutable tags it should follow a responsible deprecation cycle with customer notification, warnings when creating new task definitions, some period where new clusters cannot be created with the option to use mutable tags in tasks, etc. since this is a disruptive change which breaks systems which have been stable for years and there isn't a justification to break compatibility so rapidly.

@vat-gatepost-BARQUE
Copy link

We are also having an issue with this. Our development environment is setup to have all the services on a certain tag that keeps us from having to redeploy. They can simply stop the service and it comes back up with the most current image with that tag. Now they are having to update the service which is more steps than needed. This also seems to be a problem with our lambdas that spin up Fargate tasks and those tasks are not the most current version of the tag now. The update service is not an option on these so we are still trying to work that out.

@mvanholsteijn
Copy link

The strangest thing is that the feature was already available for those who wanted this. You can specify the container image with digest and that would pin the image explicitly. No code changes required to ECS.

floating potentially inconsistent -> my-cool-image:latest
fixed -> my-cool-image:latest@sha256:fdcfbed7e0a7beb8738e00fe8961c8e33e17bdeee94eab52cb8b85de1d04d024

@tomkins
Copy link

tomkins commented Aug 2, 2024

Also had an issue with one of our sites which I believe is related to this - a container pulling from an ECR repository with a lifecycle policy, EC2 instance restarts - and ECS wants to pull the non-existant old image as there hasn't been a fresh deploy of a container for weeks.

The version consistency is a fantastic feature, but there are situations where I want the tag to be used rather than the image digest at last deploy.

@vibhav-ag
Copy link

Sorry for the late response on this thread- we're aware of the impact this change has had and apologize for the churn this rollout has created. We've been actively working through the set of issues that have been highlighted on this thread and have 2 updates to share: 1/for customers who've been impacted by the lack of ability to see image tag information, we're working on a change that will bring back image tag information in the describe-tasks response, in the same format as was available prior to the release of version consistency (i.e image:tag). An important thing to keep in mind here is that if you run docker ps on the host, you will see the image in format image:tag but docker inspect will return image:tag@digest. 2/ We're also working on adding a configuration in the container definition that will allow you to opt-out of digest resolution for specific containers within your task- this should address both customers who want to completely opt out of digest resolution as well as customers who want to disable resolution for specific sidecar containers. I'll be using this issue to share updates on the change to bring back image tag information in describe-tasks and use issue #2393 for the change to disable digest resolution for specific containers. We're tracking both changes at high-priority.
 
Once again, we regret the churn this change has caused you all. While we still believe version consistency is the right behavior for the majority of applications hosted on ECS, we fully acknowledge that we could have done a better job socializing these changes and addressing these issues before, rather than after making the change.

@nitrotm
Copy link

nitrotm commented Aug 20, 2024

I can concur that this "software version consistency" change to ECS render the concept of services totally useless for us. We may have to fallback to manually deployed task (without services) but then we'll loose the watchdog aspects which we have to re-implement ourselves.

In short, we need to guarantee a few properties on our services running background jobs:

  1. a task cannot be stopped automatically within a deterministic time-frame. Therefore, we internally flag the task to stop accepting new jobs and letting it complete its currently assigned job. Only then, when the task is idle, we stop it and we relied on a nice property of ECS that it would automatically fetch that last container image associated with the tag (eg. 'latest').
  2. some of our services need to have at least N tasks up and running at all time (guaranteeing some kind of always-on property)
  3. some services are dynamically adjusted via auto-scaling groups due to the highly changing nature of the demand

These combined with the new constraint that all of the tasks within a service need to have the same image digest, means that we cannot roll out any update to our containers without breaking at least one property.

Tbh this feels like we may want to switch to a plain k8s solution were we can setup and manage our workloads with some degree of flexibility. Hopefully an opt-out solution will be available soon as mentioned above, but we are stuck with our deployments atm and need a solution asap.

@peterjroberts
Copy link

The forced addition of this feature also caused a significant production incident for us. We deliberately used mutable tags as part of our deployments, and an ECR lifecycle policy to remove the old untagged images after a period.

This should absolutely have been an opt-in feature, or opt-out but disabled for existing services. I'm glad to see that's now been identified and raised but should this feature not be reverted until that option is available? To prevent everyone affected having to redesign workflows or implement workarounds.

As has been pointed out by others, those that want consistency by container digest can already achieve that through either tag immutability, or referring to the digest explicitly in the task definition.

@ollypom
Copy link

ollypom commented Sep 19, 2024

A quick update on @vibhav-ag's post. We have now completed the first action in his comment.

Amazon ECS no longer modifies the containerImage field found in the DescribeTask or Task Metadata output. As part of the initial ECS software version consistency release the container image tag imageUri:tag was replaced with the container image digest imageUri@digest, this is no longer the case.

@danbf
Copy link

danbf commented Oct 10, 2024

This almost knocked down our production environment, it did knock down stage, because we had been treating our ECR images as mutable and since we were running some services on FarGate our task instances were rotated in the background. This prevented those FarGate hosted services from re-provisioning new task instances.

Turns out to be not hard for us to switch to only immutable ECR images, and we had been taking all public images and enforcing in our services to only pull from a pinned copy in our ECR repos, so we were not exposed to the issues with using a latest tag, which we basically don't even do internally.

But wow, this breaking change hit us out of left field, and it should probably have been listed as Breaking on the changelog.

We got hit during a FarGate platform maintenance event, so we ask, is there some way for AWS to mitigate similar issues for FarGate services, like waiting for the newer container tasks to be running before terminating the existing container tasks. More like a proper rolling update.

@github-project-automation github-project-automation bot moved this to Researching in containers-roadmap Oct 21, 2024
@vibhav-ag vibhav-ag moved this from Researching to Coming Soon in containers-roadmap Oct 21, 2024
@gregtws
Copy link

gregtws commented Oct 29, 2024

This has impacted us as well. We use the equivalent of a mutable 'latest' tag and perform rolling service upgrades when we move the 'latest' tag. This lets us slowly do blue/green deployments (as our service can be told to recycle itself over time).

Instead we weren't actually progressing our blue/green deployment as AWS kept deploying the old revision of the service rather than the one pointed to by our mutable tag. Even replacing the EC2 instance didn't fix it. Only re-running the service deployment has done the trick. This is a massive behavior change and shouldn't have ever been released without opting into the change.

@acdha
Copy link

acdha commented Oct 30, 2024

We got hit during a FarGate platform maintenance event, so we ask, is there some way for AWS to mitigate similar issues for FarGate services, like waiting for the newer container tasks to be running before terminating the existing container tasks. More like a proper rolling update.

Fargate normally does that health-based deployment but that won't help you if the old containers can't continue running due to a failure in the container or host. That's one of the reasons why this mistake was so dangerous is that unless you monitor the ECS service events you will have a service which is working normally until a previously-recoverable error occurs and then you learn that the ECS team broke your deployment in July when something is completely down.

What I ended up with is an Event Bridge rule which listens to the ECR Image Action for our source repositories and a Lambda listener which creates a new ECS service deployment to ensure that there's never a situation where our ECR tags are updated but ECS is now looking for the old version (we use environment-tracking branches & tags so the latest version is something like “testing” or “staging”). That isn't enough to avoid problems with Amazon's own containers, however, so our deployment pipeline now does a lookup for CloudWatch and X-Ray to get the current versioned tag which latest currently corresponds to so it can use that instead.

@rs-garrick
Copy link

Just to add another use-case...

I have an app with very long-running background processes. This app is not deployed with --force-new-deployment because of other issues with ECS deploys related to long-running processes that take days to exit.

Instead, all instances are marked DRAINING so that new instances are created with the updated container image. Because the service revision is never updated with the new sha, the new instances pull down an old container image.

Oddly, there's no way to update the service revision with the new sha without triggering an actual deploy. This is the missing piece for me.

I need a way to update the sha stored in the service revision without triggering a deployment. Something like aws ecs update-service --new-service-revision or --update-container-sha (instead of --force-new-deployment) would be fine.

$ aws ecs describe-service-revisions --service-revision-arns arn:aws:ecs...
...
            "containerImages": [
                {
                    "containerName": "foo",
                    "imageDigest": "sha256:483...76d",   <-- let me update this!!
                    "image": "3622.../foo:prd"
                }
            ],

@isker
Copy link

isker commented Nov 18, 2024

2/ We're also working on adding a configuration in the container definition that will allow you to opt-out of digest resolution for specific containers within your task- this should address both customers who want to completely opt out of digest resolution as well as customers who want to disable resolution for specific sidecar containers.

If all containers in the task are opted out, will this remove the latency impact associated with this feature as well? I have a service that's updated with great frequency and whose deployments are latency sensitive. Two of the three containers in this service are already deployed with digests, but the third uses a tag because its image is built/published by CDK, and it's not possible to get access to the digest of such images to use in task definitions. So I believe we are stuck with the latency impact of this feature for mostly no reason: the image tags produced by CDK already approximate the behavior of digests, in that the image should not change if the tag is not changing.

I'm specifically referring to this line from the documentation:

To avoid potential latency altogether, specify container image digests in your task definition.

@vibhav-ag
Copy link

vibhav-ag commented Nov 19, 2024

Update 2: you now have the ability to disable consistency for specific containers in your task by configuring the new versionConsistency field for each container in the task definition. Any changes to this property are applied after a deployment. Once again, we regret the churn this change has caused you all.

What’s New Post

@github-project-automation github-project-automation bot moved this from Coming Soon to Shipped in containers-roadmap Nov 19, 2024
@vibhav-ag
Copy link

2/ We're also working on adding a configuration in the container definition that will allow you to opt-out of digest resolution for specific containers within your task- this should address both customers who want to completely opt out of digest resolution as well as customers who want to disable resolution for specific sidecar containers.

If all containers in the task are opted out, will this remove the latency impact associated with this feature as well? I have a service that's updated with great frequency and whose deployments are latency sensitive. Two of the three containers in this service are already deployed with digests, but the third uses a tag because its image is built/published by CDK, and it's not possible to get access to the digest of such images to use in task definitions. So I believe we are stuck with the latency impact of this feature for mostly no reason: the image tags produced by CDK already approximate the behavior of digests, in that the image should not change if the tag is not changing.

I'm specifically referring to this line from the documentation:

To avoid potential latency altogether, specify container image digests in your task definition.

Yes, if you opt out every container, you will see no impact to deployment latency because of digest resolution.

@vibhav-ag vibhav-ag added Shipped This feature request was delivered. and removed Coming Soon labels Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECS Amazon Elastic Container Service Shipped This feature request was delivered.
Projects
Status: Shipped
Development

No branches or pull requests