-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement transition function for MANIFEST_PULLED state #4152
Merged
amogh09
merged 4 commits into
aws:feature/digest-resolution
from
amogh09:manifest-transition
Apr 29, 2024
Merged
Implement transition function for MANIFEST_PULLED state #4152
amogh09
merged 4 commits into
aws:feature/digest-resolution
from
amogh09:manifest-transition
Apr 29, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
amogh09
force-pushed
the
manifest-transition
branch
2 times, most recently
from
April 23, 2024 04:06
3bb91b8
to
3d299fb
Compare
amogh09
force-pushed
the
manifest-transition
branch
from
April 23, 2024 17:09
b723a97
to
13462f7
Compare
amogh09
changed the title
Manifest transition
Implement transiton function for MANIFEST_PULLED state
Apr 23, 2024
amogh09
changed the title
Implement transiton function for MANIFEST_PULLED state
Implement transition function for MANIFEST_PULLED state
Apr 23, 2024
sparrc
reviewed
Apr 24, 2024
sparrc
reviewed
Apr 24, 2024
sparrc
reviewed
Apr 24, 2024
sparrc
reviewed
Apr 24, 2024
amogh09
force-pushed
the
manifest-transition
branch
from
April 24, 2024 17:54
8b20d49
to
6ae338a
Compare
…ove comments and logging
yinyic
reviewed
Apr 25, 2024
amogh09
force-pushed
the
manifest-transition
branch
from
April 25, 2024 19:12
8f8e50d
to
13f8938
Compare
yinyic
approved these changes
Apr 25, 2024
sparrc
approved these changes
Apr 29, 2024
amogh09
added a commit
that referenced
this pull request
Apr 29, 2024
amogh09
added a commit
that referenced
this pull request
May 7, 2024
amogh09
added a commit
that referenced
this pull request
May 7, 2024
amogh09
added a commit
that referenced
this pull request
May 13, 2024
amogh09
added a commit
that referenced
this pull request
May 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements a transition function for container MANIFEST_PULLED internal state. This state was added in #4137 but the transition function was left unimplemented. This PR fills that gap.
The transition function for MANIFEST_PULLED state tries to resolve the image manifest digest for the container image using the following logic and sets the
container.ImageDigest
field with it.ECS_MANIFEST_PULL_TIMEOUT
is used when calling the registry.In case of a failure during transition to MANIFEST_PULLED state a suitable error is generated which will become the container's
ApplyingError
as is the case with any other state transition failures. Except for timeout and authentication errors,CannotPullImageManifestError
error is used to wrap errors during transition toMANIFEST_PULLED
state. This aligns with the use ofCannotPullContainerError
to wrap errors during transition toPULLED
state. For example, "CannotPullImageManifestError: Error response from daemon: manifest unknown: Requested image not found" would be the reason for container transition failure if transition toMANIFEST_PULLED
state failed due to the image reference in the task payload being invalid.As noted above, a new configuration setting
ECS_MANIFEST_PULL_TIMEOUT
is introduced in this PR that allows users to configure a timeout to be used when calling image registries to pull image manifests. The default value for this timeout is 1 minute and the minimum configurable value is 30 seconds.This change adds additional time to task starts as transition to MANIFEST_PULLED state can make an additional blocking call to image registry. In my test environment the additional time ranges from 300ms (ECR) to 900ms (Dockerhub). This is expected. In an upcoming PR we will change image pulls to use the resolved digest which will make image pulls a bit faster and the long-term impact on task start will but cut to 150ms (ECR) to 500ms (Dockerhub). For this reason, this PR is being merged to a short-lived feature branch so that we merge both changes to dev branch as a single commit.
Implementation details
*DockerTaskEngine.pullContainerManifest
which is the transition function forMANIFEST_PULLED
state as described above. A new helper method*DockerTaskEngine.setRegistryCredentials
is added which is used to resolve private registry credentials. Code for this method is taken from the existingpullAndUpdateContainerReference
method that is also refactored to callsetRegistryCredentials
method.*dockerImageManager.RecordContainerReference
method that is responsible for settingImageDigest
on containers. Now the method setsImageDigest
only if it's not pre-populated by the transition to MANIFEST_PULLED state.ManifestPullTimeout
setting inConfig
and parse its value fromECS_MANIFEST_PULL_TIMEOUT
environment variable if it is available. Update relevant tests.MANIFEST_PULLED
state.PutASMDockerAuthConfig
method is added to*ASMAuthResource
so that the resource may be set up with fake docker auth config for testing purposes. The method is not used outside of testing.Testing
Many new unit and integration tests are added.
Unit tests -
TestPullContainerManifest
- Tests*DockerTaskEngine.pullContainerManifest
method for all image pull behavior and container type cases. Checks all interactions with docker.TestManifestPullTaskShouldContinue
- Tests task engine's handling of tasks with a focus on transition toMANIFEST_PULLED
state. This test covers success and failure cases for the transition when the task should continue its lifecycle. If image pull behavior isdefault
orprefer-cached
then the task should continue its lifecycle even if there was an error during transition toMANIFEST_PULLED
state which is the same behavior as transition toPULLED
state.TestManifestPullFailuresTaskShouldStop
- Similar toTestManifestPullTaskShouldContinue
above but covers failure cases for the transition when the task should be stopped due to the transition failure. If image pull behavior isalways
oronce
then the task should be stopped if there was an error during transition toMANIFEST_PULLED
state which is the same behavior as transition toPULLED
state.TestImagePullRequired
- Tests*DockerTaskEngine.imagePullRequired
method which is missing unit tests.TestSetRegistryCredentials
- Tests the newly added*DockerTaskEngine.setRegistryCredentials
method. Ensures that the method sets registry auth credentials on the container and returns a function that can be called to clear the credentials when they are no longer needed.Integration tests -
TestManifestPulledDoesNotDependOnContainerOrdering
- This existing test forMANIFEST_PULLED
state is updated to check thatImageDigest
field is set for containers after they have transitioned toMANIFEST_PULLED
state.TestPullContainerManifestInteg
- This new test checks that*DockerTaskEngine.pullContainerManifest
method works as expected for all image pull behaviors. That is, the method setsImageDigest
if success is expected and returns the right error if failure is expected.I also performed the following manual tests.
MANIFEST_PULLED
state.once
,default
,always
, andprefer-cached
image pull behaviors. Verified in the logs that Agent gets contacts the registry fordefault
andalways
image pull behaviors and gets the digest from local image foronce
andprefer-cached
image pull behaviors if the image is cached.always
image pull behavior. Observed an increase in task start time ranging from 300ms (for ECR) to 900ms (for Dockerhub) owing to the new image registry call during transition toMANIFEST_PULLED
state. This increase in task start time is expected given that a new blocking network call is made to fetch image manifest before the container can be started.Failed to find a supported API version that supports manifest pulls. Skipping digest resolution.
warning message printed in the logs as expected.New tests cover the changes: yes
Description for the changelog
Feature - Resolve image manifest digest for task containers during transition to
MANIFEST_PULLED
state before container image pulls. In future this early resolution of manifest digest will be used to expedite the reporting of manifest digests to ECS backend.Does this PR include breaking model changes? If so, Have you added transformation functions?
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.