Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate AWS Verifier to aws-sdk-go-v2 #16483

Merged
merged 2 commits into from
May 5, 2024

Conversation

rifelpet
Copy link
Member

@rifelpet rifelpet commented Apr 21, 2024

Closes #16424

From this comment:

While presigned requests are still supported in V2, the presign methods and types no longer provide access to the request body, only their url and headers. See aws/aws-sdk-go-v2#1137. Kops-controller currently reads the request body to perform some validation:

requestBytes, _ := io.ReadAll(stsRequest.Body)
_, _ = stsRequest.Body.Seek(0, io.SeekStart)
if stsRequest.HTTPRequest.Header.Get("Content-Length") != strconv.Itoa(len(requestBytes)) {
return nil, fmt.Errorf("incorrect content-length")
}

In V1 the presigned request is a POST however in V2 it is converted to a GET request and the normal Action=GetCallerIdentity&Version=2011-06-15 body is moved to URL query parameters:
https://github.com/aws/aws-sdk-go-v2/blob/bc2a669d3241023e20194cdfe042b8c275887e51/service/sts/api_client.go#L641-L645

I've removed the validation that checks the Content-Length signed header matches the size of the request body, given the request body is now empty.

This thread on the original kops-controller PR discusses potential upgrade challenges. In this case I believe we'll have the normal race with this type of change:

  1. Old node is launched
  2. New kops update cluster --yes and kops rolling-update --yes is ran
  3. New control plane node is launched, new kops-controller is running
  4. Old node attempts to join and fails because the GetCallerIdentity requests dont match.

In reality I dont anticipate this being a problem because the old node would have needed to not join the cluster in the time it took the control plane to bootstrap and launch the kops-controller daemonset pod (a significantly longer process than normal k8s node bootstrap). Eventually the failed node will be cleaned up by cluster-autoscaler or karpenter.

My only concern would be disruption to kops rolling-update if the node never joins which causes cluster validation to fail until it is cleaned up.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 21, 2024
@rifelpet
Copy link
Member Author

/cc @hakman @justinsb

/hold for discussion

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2024
WithSTSRegionalEndpoint(endpoints.RegionalSTSEndpoint)
sess, err := session.NewSession(config)
func NewAWSAuthenticator(ctx context.Context, region string) (bootstrap.Authenticator, error) {
config, err := awsconfig.LoadDefaultConfig(ctx, awsconfig.WithRegion(region))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added WithSTSRegionalEndpoint in #12043 . Is it still supported? (Is it automatic now?)

@justinsb
Copy link
Member

We don't have great unit test coverage here, so I figured it might be good to add some: #16487

TBH I'm struggling to get the test to pass with v2, trying to figure it out...

I think we were already very sensitive to the aws-sdk version, so I don't think this is a huge regression version-wise.

@justinsb
Copy link
Member

I think I've figured it out ... the V2 API puts a lot more in the query URL. So we likely need to pass along the url/method/headers. The good thing is that eliminates the dependency on the aws-sdk-go version (going forwards!)

Here's my WIP / PoC branch ... let me know if you want me e.g. to send a PR to your branch or something like that. I'm happy to fix up the tests, the code change itself here is pretty simple (we encode an awsV2Token, including the url & method)

https://github.com/kubernetes/kops/compare/master...justinsb:kops:aws-sdk-go-v2-verifier?expand=1

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2024
@rifelpet
Copy link
Member Author

I think I've figured it out ... the V2 API puts a lot more in the query URL. So we likely need to pass along the url/method/headers. The good thing is that eliminates the dependency on the aws-sdk-go version (going forwards!)

Here's my WIP / PoC branch ... let me know if you want me e.g. to send a PR to your branch or something like that. I'm happy to fix up the tests, the code change itself here is pretty simple (we encode an awsV2Token, including the url & method)

master...justinsb:kops:aws-sdk-go-v2-verifier?expand=1 (compare)

Yes, feel free to PR to my branch or just push directly to it. I just rebased to resolve conflicts. Adding the other data in to the token makes sense to me 👍🏻

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits labels Apr 27, 2024
@k8s-ci-robot k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits labels Apr 27, 2024
@hakman
Copy link
Member

hakman commented Apr 27, 2024

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2024
rifelpet and others added 2 commits May 5, 2024 08:39
We pass the full request details, it's less dependent on client
versions.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2024
@rifelpet
Copy link
Member Author

rifelpet commented May 5, 2024

/unhold

/cc @hakman

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2024
@hakman
Copy link
Member

hakman commented May 5, 2024

/lgtm
/assign @justinsb

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2024
@justinsb
Copy link
Member

justinsb commented May 5, 2024

Thanks @rifelpet

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit 9582763 into kubernetes:master May 5, 2024
22 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kops-controller area/nodeup area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to aws-sdk-go-v2
4 participants