Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues pushing images to Amazon ECR #279

Merged
merged 3 commits into from
Jan 10, 2023

Conversation

normj
Copy link
Contributor

@normj normj commented Jan 5, 2023

Note: I'm on the AWS .NET SDK team not the ECR team. I did some group debugging with the ECR team to understand the failures with this tooling pushing to ECR.

This PR address the idiosyncrasy when pushing images to ECR.

  • If the repository being pushed does not exist ECR abruptly ends the request. I believe that is why in the RegistryAuthentication.md says the upload was cancelled for unknown reasons. I added special ECR error logic to look for the abrupt end to the request and inform the user they probably haven't created the repository yet.
  • When doing the patch chunk uploads of the parts ECR returns back status code 201 (Created) instead of the standard 202 (Accepted). I know this is non standard but at this point if ECR changed the status code that would be a breaking change.
  • The chunk size needs to be a minimum of 5 MB except for the last chunk. This PR changes the buffer size to 5 MB when pushing to ECR. Not sure if the original 64 KB had a specific meaning but the logic could be simplified by always using 5 MB buffer size.

I also found that parallelizing uploading all of the layers at the same time especially with the 5 MB buffer size caused to many socket retries and ultimately failures. I changed the code for ECR to upload the layers serially. Probably more should be done there in general to better handle throttling and error reporting since currently a messy AggregateException is thrown up the chain but wanted to get feedback before doing anything more in that space.

DotnetECRPublish

@rainersigwald
Copy link
Member

Awesome, thanks! I'm working on something else today and probably can't review immediately but:

  • The chunk size needs to be a minimum of 5 MB except for the last chunk. This PR changes the buffer size to 5 MB when pushing to ECR. Not sure if the original 64 KB had a specific meaning but the logic could be simplified by always using 5 MB buffer size.

IIRC that was not a principled number but something that "seemed to work in practice", so I have no strong objection to changing it always . . . unless that causes problems for some other registry.

Probably more should be done there in general to better handle throttling and error reporting since currently a messy AggregateException is thrown up the chain

Yes, that type of error handling + reporting is something that needs more work in this project in general.

@baronfel
Copy link
Member

baronfel commented Jan 5, 2023

This is amazing, thank you for your time and effort here @normj!

Copy link

@Th2023c Th2023c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There normj:support-amazon-ecr if there is a problem that is a bit more complicated then you need to consider it not being able to process

Copy link
Member

@baronfel baronfel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @normj - I'm going to merge now, and a new prerelease version of the package should be available on Github Packages shortly afterwards.

I'll update my test harness at baronfel/sdk-container-demo and check that the AWS scenario lights up (and that other scenarios aren't impacted).

@baronfel baronfel merged commit 3621378 into dotnet:main Jan 10, 2023
@normj
Copy link
Contributor Author

normj commented Jan 10, 2023

Great to see this merged into. Feel free to reach out to me if there are other issues with ECR.

@baronfel
Copy link
Member

I screwed up the byte calculation, for one thing. Going to patch that in momentarily.

@baronfel
Copy link
Member

baronfel commented Jan 10, 2023

Looking great @normj: https://github.com/baronfel/sdk-container-demo/actions/runs/3886025610/jobs/6630637915, hard to mess with 36s from zero-to-published :)

@normj
Copy link
Contributor Author

normj commented Jan 10, 2023

@baronfel Awesome!

@normj normj deleted the support-amazon-ecr branch January 10, 2023 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants