AWS Pod identities need to be reviewed #4134

JorTurFer · 2023-01-18T07:57:44Z

Report

It's mandatory to define serviceAccountName in the scaled workload because there is some code that read the awsRoleArn even it's not always necessary for this authentication (the SDK reads the AWS role from the KEDA service account).

My proposal is to deprecate identityOwner in favor of a new option in TriggerAuthentication (as we have in with azure pod identity).

aws-kiam should be removed as it has been deprecated in favor of aws-eks. We should deprecate it and remove in KEDA v2.12.

The text was updated successfully, but these errors were encountered:

blakepettersson · 2023-01-18T13:14:48Z

It would still be useful if we could assume other roles from the keda operator, could we have a roleArn attribute somewhere in TriggerAuthentication, which the KEDA operator could then assume?

JorTurFer · 2023-01-18T13:51:36Z

Do you mean something like this in azure?

We did it to prevent cases where the same identity (role in aws terms) stack a lot of permissions, there you can federate multiple identities with the same k8s service account and KEDA takes one or another based on the TriggerAuthentication

It should be doable if you meant that, I'm not 100% sure, but it's just to give a try.

Are you willing to contribute?

blakepettersson · 2023-01-19T12:38:59Z

I'm not familiar with Azure but I think that's the same concept. The KEDA service account would have an IAM role with a policy which would allow it to take on (assume) other roles, like the example below.

{
    "Version": "2012-10-17",
    "Statement": [{
        "Action": "sts:AssumeRole",
        "Effect": "Allow"
        "Resource": "arn:aws:iam::*:role/grant-keda-access"
      }]
}

If we were to have a awsRoleArn in the TriggerAuthentication CRD, or by using the awsRoleArn that's available in the various scalers, the KEDA operator would then assume the given role, as long as the appropriate assume role policy has been granted by the awsRoleArn (something like this in Terraform syntax)

resource "aws_iam_role" "give-keda-access" {
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::$my-account-id:role/my-keda-operator-role"
        }
      },
    ]
  })
  name = "grant-keda-access"
}

I took a look at the code and it seems like this should work, but never really got it to work without doing what you were describing (by defining a service account name etc, which seems unnecessary). It seems like it should be enough to remove all references to identityOwner and to modify getAwsConfig to be something like this:

func getAwsConfig(awsRegion string, awsEndpoint string, awsAuthorization awsAuthorizationMetadata) (*session.Session, *aws.Config) {
	metadata := &awsConfigMetadata{
		awsRegion:        awsRegion,
		awsEndpoint:      awsEndpoint,
		awsAuthorization: awsAuthorization}

	sess := session.Must(session.NewSession(&aws.Config{
		Region:   aws.String(metadata.awsRegion),
		Endpoint: aws.String(metadata.awsEndpoint),
	}))

	var creds *credentials.Credentials

	if metadata.awsAuthorization.awsRoleArn != "" {
		creds = stscreds.NewCredentials(sess, metadata.awsAuthorization.awsRoleArn)
	} else if metadata.awsAuthorization.awsAccessKeyID != "" && metadata.awsAuthorization.awsSecretAccessKey != "" {
		creds = credentials.NewStaticCredentials(metadata.awsAuthorization.awsAccessKeyID, metadata.awsAuthorization.awsSecretAccessKey, "")
	}

	return sess, &aws.Config{
		Region:      aws.String(metadata.awsRegion),
		Endpoint:    aws.String(metadata.awsEndpoint),
		Credentials: creds,
	}
}

I'd be willing to take a deeper look into this and send a PR if I get this to work 😄

JorTurFer · 2023-01-19T13:05:49Z

Let's wait for removing the identityOwner because maybe I'm wrong and it works somehow (I'm not an expert in AWS) because there is a user in slack who has it working: https://kubernetes.slack.com/archives/CKZJ36A5D/p1674015373425479

JorTurFer · 2023-01-19T20:38:00Z

Okey, I have discovered how it works:
when identityOwner is empty or identityOwner=pod, it uses the ARN from the workload service account instead of environment variables from KEDA pod. The implementation is quite confusing and forces some things, but it works.

stale · 2023-03-21T01:09:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2023-04-25T18:59:21Z

I have been thinking about this, and I believe that creating a new authentication provider for AWS, with a good design, providing the same capabilities that azure has, is a good option. Once this new one is ready, I'd deprecate the other 2 (they are a bit caothic and complex to maintain)
WDYT @kedacore/keda-contributors ?

tomkerkhove · 2023-04-26T07:19:31Z

I'm fine if we are sure that the potentially new provider works perfectly and has full parity

JorTurFer · 2023-04-26T07:30:24Z

I'm fine if we are sure that the potentially new provider works perfectly and has full parity

The problem is that the current providers are really poor documented, and definitively they have some weird configurations (adding e2e test for them was horrible due to that). That's why I'd redesign the AWS identity in KEDA based on Azure identity (which is really nice IMO), documenting it properly

jeevanragula · 2023-08-25T06:37:38Z

We are also facing some issues and the authentication using PodIdentity is quite confusing.
We noticed that even after specifying podIdentity.provider as aws-eks the keda is trying to assume the role using the node role. Ideally it should use WebIdentity associated with the deployment pod.

  Warning  KEDAScalerFailed  24s (x6 over 2m54s)  keda-operator  (combined from similar events): AccessDenied: User: arn:aws:sts::1111122222:assumed-role/dev_node_group_role/i-0d8345ca0020 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::1111122222:role/sqs-event-processor
           status code: 403, request id: 2b1a303f-95c4-4ad0-874c-16c705f3ae0c

JorTurFer · 2023-08-25T06:44:32Z

We noticed that even after specifying podIdentity.provider as aws-eks the keda is trying to assume the role using the node role. Ideally it should use WebIdentity associated with the deployment pod.

Have you updated KEDA deployment with the needed information? https://github.com/kedacore/charts/blob/main/keda/values.yaml#L210
If you enable pod identity without assigning and identity to the deployment (you have to restart it too for mutating the new pod), it will try to use the node role. Could you check if KEDA pod has been correctly mutated with the required envs?

What KEDA version are you using?

ghost · 2023-10-06T12:39:32Z

Using podIdentity.provider: aws-eks means KEDA service account will have following type of annotation:

apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/audience: sts.amazonaws.com
    eks.amazonaws.com/role-arn: arn:aws:iam::214272731565:role/example-async-dev-ew1-queuescaling-role

That means in a cluster, with using aws-eks it will only work with only one IAM role and it will become problem for shared cluster where multiple tenants use different IAM role and currently it's not possible to attach multiple IAM role as annotation to KEDA service account.

Do we have any workaround for this?

JorTurFer · 2023-10-09T13:41:29Z

Do we have any workaround for this?

Currently no (or at least now well documented), that's why this issue is opened :)
In slack threads there is a workaround for that given by a user

JorTurFer · 2023-10-09T13:42:16Z

I think that it's time to tackle this issue from our side because there haven't been willing contributors xD
Let me make a quick proposal for the new podIdentity (hopping to be more clear and easy), the proposal is:

introduce a new podIdentity
if the TriggerAuthentication doesn't override the role, KEDA will use its own role for the scaler
if the TriggerAuthentication do override the role, KEDA will assume the workload role, and use it for the scaler
this management/decision path will be done based on having or not another role within TriggerAuthentication (no other fields will be required on scaler metadata)
this behaviour will be covered (IDK how yet) with e2e tests
deprecate and eventually remove aws-eks and aws-kiam

@tomkerkhove @zroubalik ?

tomkerkhove · 2023-10-10T05:46:37Z

Why do we need to deprecate aws-eks and introduce a new one? I'm not sure I get that.

JorTurFer · 2023-10-10T07:01:56Z

aws-kiam has to be deprecated/removed because the product itself it has been deprecated and abandoned in favor of aws-eks.

IMHO, aws-eks should be deprecated too because there are some important problems:

Missing documentation and e2e tests
The feature is coupled between different "layers". You have to specify it in TriggerAuhtentication but also in scalers. This is too different from the approach we follow for example in Azure podIdentity. The consistency is always important.
The code is (at least to me) confusing and complex to follow (and maintain).
The name suggests that only works for EKS and it works on any cluster indeed

But! After working on this a bit, I think that I have discovered how it works (I'm writing e2e tests for new podIdentity draft) and maybe we could just document properly how aws-eks works, mark as deprecated and keep it there until next major release.
But in any case, I'd add the new podIdentity, unifying the behaviour with Azure podIdentities and moving the podIdentity logic from scalers to the TriggerAuthentication.

zroubalik · 2023-10-10T10:06:20Z

I agree with the approach proposed by @JorTurFer. If there's a problem with deprecation, we can keep the current one around as long as needed.

Great job!

tomkerkhove · 2023-10-10T11:36:36Z

aws-kiam has to be deprecated/removed because the product itself it has been deprecated and abandoned in favor of aws-eks.

Definately fine for me!

IMHO, aws-eks should be deprecated too because there are some important problems:

Missing documentation and e2e tests

The feature is coupled between different "layers". You have to specify it in TriggerAuhtentication but also in scalers. This is too different from the approach we follow for example in Azure podIdentity. The consistency is always important.

The code is (at least to me) confusing and complex to follow (and maintain).

The name suggests that only works for EKS and it works on any cluster indeed

But these are not really reasons why we should deprecate it though? It's just in a bad state.

But! After working on this a bit, I think that I have discovered how it works (I'm writing e2e tests for new podIdentity draft) and maybe we could just document properly how aws-eks works, mark as deprecated and keep it there until next major release. But in any case, I'd add the new podIdentity, unifying the behaviour with Azure podIdentities and moving the podIdentity logic from scalers to the TriggerAuthentication.

What is this new pod identity then? I'm not sure I get the full proposal here :)

JorTurFer · 2023-10-10T12:17:14Z

But these are not really reasons why we should deprecate it though? It's just in a bad state.

I don't agree with that. Based on our deprecation policies, we have a few options to remove things, and that generates technical debt. Current approach is totally coupled to the trigger. metadata, and we can't change it based on our deprecation policies. It's not just a single reason, is the whole list together, poor docs, e2e test gaps, too much complex code, alignment gap with AzurepodIdentity, confusing name (EKS is not IRSA, AWS IRSA means roles, AWS EKS means k8s, aws-eks suggest something for EKS and not for IRSA).

We can't address all of them, that's why IMHO redesigning it totally, testing and documenting it is better than adding pore patches and increasing technical debt.

We could introduce the new one, and deprecate the others. In this case, I agree with maintaining aws-eks there as deprecated without new features. This helps us because instead of adding it as a breaking change at some time (nvm if it's during major version or minor), we have warned users and they probable would have migrated to the new one.

JorTurFer · 2023-10-10T12:22:46Z

I'm worried about keeping any kind of code just to not break something, because the technical debt grows and it can be a problem... Not documented + not tested code is potentially technical debt, even more in our case, where random contributors add the code and we don't see them anymore.

For example: In gcp podIdentity, the contributor was going to introduce the same coupled approach (not being needed because in gcp we don't allow using other identities and their code didn't do it), just because they saw it in already existing code.

JorTurFer · 2023-10-10T12:33:38Z

Said this, If I'm the only one who thinks that this is required to improve in general (code, standardization, etc..) , we can just add some e2e tests to current approach and close this issue (deprecating&removing the deprecated aws-kiam, I think we've agreed on this).

I can be missing some important point or maybe my opinion is influenced by my knowledge gap in AWS

zroubalik · 2023-10-10T12:45:27Z

Said this, If I'm the only one who thinks that this is required to improve in general (code, standardization, etc..) , we can just add some e2e tests to current approach and close this issue (deprecating&removing the deprecated aws-kiam, I think we've agreed on this).

I agree with you, improvements is needed for sure.

tomkerkhove · 2023-10-10T13:32:34Z

What I'm not fully getting is that we want to deprecate 2 trigger auth types, but introduce a new one. So how will this new one be better? Did they announce a new authentication type, or is it simply because we want to re-work it so that it's better maintained and we just don't like aws-eks as a name?

tomkerkhove · 2023-10-10T13:33:14Z

I'm not saying we can't deprecate them both, just trying to understand the reasoning for aws-eks and what this new thing is

JorTurFer · 2023-10-10T14:55:59Z

Honestly, the only reason to deprecate the old one (let's talk about aws-eks only because the cause behind aws-kiam deprecation is just upstream deprecation in favor of irsa, nothing to do here) is because I see it more easy to introduce all the required changes with no impact for users xD

I mean, I'd agree with extending current aws-eks with the new changes in TriggerAuthentication if we would be going to deprecate and remove the trigger configuration part in a reasonable time (2-3 versions). If we just extend current aws-eks with the new options, we will create more confusing in users (more options for doing the same), and we will introduce more complexity to the code, because we will need to take into account the precedence to not break something.

In the other hand, creating new authentication, we have the freedom to design it as we want, not having to be fully compatible because it's another different authentication (based on the same? Yes, but with totally different "entity") . From coding pov, this will be so much better for adding, but also for removing the old code. From users pov, this is a migrations that thay have to do explicitly and that's why I also agree with maintaining aws-eks as deprecated and the new one until next major version, helping users with the migration path.

Summarizing, the only reason for adding a new auth and deprecating the previous one is for not having troubles with breaking changes and deprecation policies, as removing the field from scaler auth section and adding it as first class property of TA is a breaking change and personally I'm against supporting both places in the same auth to make our lifes happier xD

tomkerkhove · 2023-10-10T17:33:03Z

I think I'm not fully grasping it, still - SOrry :)

Does that mean we currently have it defined on the scalers instead of trigger authentication then? If not, what would the new trigger authentication look like compared to what we have today?

I think deprecating an existing trigger authentication just to change things might be a bit aggressive and cause end-user frustration we can avoid; but I am probably missing something. I want to avoid cognitive load on end-users

tomkerkhove · 2023-10-12T12:38:07Z

Just synced up with Zbynek and if we're doing this to move it from trigger metadata to auth then I'm fine with it

JorTurFer added the bug Something isn't working label Jan 18, 2023

JorTurFer self-assigned this Jan 18, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Mar 21, 2023

JorTurFer added stale-bot-ignore All issues that should not be automatically closed by our stale bot and removed stale All issues that are marked as stale due to inactivity labels Mar 21, 2023

JorTurFer removed their assignment Mar 21, 2023

JorTurFer mentioned this issue Jul 3, 2023

[AWS SQS Queue]: AccessDenied, cannot access sqs to get queue information #4764

Closed

This was referenced Oct 15, 2023

Removing support for AWS-EKS Identity authentication from KEDA on v3 #5086

Open

Add new AWS podIdentity #5061

Merged

Add documentation about new AWS auth kedacore/keda-docs#1251

Merged

JorTurFer mentioned this issue Dec 11, 2023

Release: 2.13.0 #5275

Closed

25 tasks

JorTurFer closed this as completed in #5061 Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Pod identities need to be reviewed #4134

AWS Pod identities need to be reviewed #4134

JorTurFer commented Jan 18, 2023 •

edited

Loading

blakepettersson commented Jan 18, 2023

JorTurFer commented Jan 18, 2023

blakepettersson commented Jan 19, 2023

JorTurFer commented Jan 19, 2023 •

edited

Loading

JorTurFer commented Jan 19, 2023

stale bot commented Mar 21, 2023

JorTurFer commented Apr 25, 2023

tomkerkhove commented Apr 26, 2023

JorTurFer commented Apr 26, 2023

jeevanragula commented Aug 25, 2023

JorTurFer commented Aug 25, 2023

ghost commented Oct 6, 2023

JorTurFer commented Oct 9, 2023

JorTurFer commented Oct 9, 2023 •

edited

Loading

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 •

edited

Loading

zroubalik commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 •

edited

Loading

JorTurFer commented Oct 10, 2023 •

edited

Loading

zroubalik commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 •

edited

Loading

tomkerkhove commented Oct 10, 2023

tomkerkhove commented Oct 12, 2023

AWS Pod identities need to be reviewed #4134

AWS Pod identities need to be reviewed #4134

Comments

JorTurFer commented Jan 18, 2023 • edited Loading

Report

blakepettersson commented Jan 18, 2023

JorTurFer commented Jan 18, 2023

blakepettersson commented Jan 19, 2023

JorTurFer commented Jan 19, 2023 • edited Loading

JorTurFer commented Jan 19, 2023

stale bot commented Mar 21, 2023

JorTurFer commented Apr 25, 2023

tomkerkhove commented Apr 26, 2023

JorTurFer commented Apr 26, 2023

jeevanragula commented Aug 25, 2023

JorTurFer commented Aug 25, 2023

ghost commented Oct 6, 2023

JorTurFer commented Oct 9, 2023

JorTurFer commented Oct 9, 2023 • edited Loading

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 • edited Loading

zroubalik commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 • edited Loading

JorTurFer commented Oct 10, 2023 • edited Loading

zroubalik commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

tomkerkhove commented Oct 10, 2023

JorTurFer commented Oct 10, 2023 • edited Loading

tomkerkhove commented Oct 10, 2023

tomkerkhove commented Oct 12, 2023

JorTurFer commented Jan 18, 2023 •

edited

Loading

JorTurFer commented Jan 19, 2023 •

edited

Loading

JorTurFer commented Oct 9, 2023 •

edited

Loading

JorTurFer commented Oct 10, 2023 •

edited

Loading

JorTurFer commented Oct 10, 2023 •

edited

Loading

JorTurFer commented Oct 10, 2023 •

edited

Loading

JorTurFer commented Oct 10, 2023 •

edited

Loading