Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(codepipeline): large cross-region pipelines exceed IAM policy size limit #16350

Closed

Conversation

skinny85
Copy link
Contributor

@skinny85 skinny85 commented Sep 2, 2021

When we generate CodePipelines, we need to add an sts:AssumeRole statement for each Action in the pipeline,
and a Bucket.grantReadWrite() statement for each region the pipeline is in,
to the policy statement of the pipeline's Role.
For pipelines with many Actions and/or regions,
this makes the policy exceed the IAM limit of 10240 bytes.

Extract a new class from the CodePipeline CloudFormation Actions that caches the statements added to a given Principal by the 'Action' field,
and groups the statements with the same 'Actions' by adding elements to the 'Resource' field.
This dramatically reduces the duplication in the statement,
and increases the chances of it being smaller than the limit.
Use this new class in the Pipeline construct.

Fixes #16244


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@skinny85 skinny85 requested review from rix0rrr and njlynch September 2, 2021 18:30
@skinny85 skinny85 self-assigned this Sep 2, 2021
@gitpod-io
Copy link

gitpod-io bot commented Sep 2, 2021

@skinny85 skinny85 changed the title fix(codepipeline): large cross-region CodePipeline exceed IAM policy size limit fix(codepipeline): large cross-region CodePipelines exceed IAM policy size limit Sep 2, 2021
@skinny85 skinny85 changed the title fix(codepipeline): large cross-region CodePipelines exceed IAM policy size limit fix(codepipeline): large cross-region pipelines exceed IAM policy size limit Sep 2, 2021
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Sep 2, 2021
@skinny85 skinny85 force-pushed the fix/codepipeline-too-big-policy branch from 0416e68 to d12193d Compare September 2, 2021 18:59
@skinny85
Copy link
Contributor Author

skinny85 commented Sep 2, 2021

Hey @rix0rrr @njlynch! This is a different approach to the "too large IAM policies" problem that we're aware of. I understand we're hesitant to explore this area by doing too much of statement deduplication (see issue #14713) because of previous bad experiences, but I hope this PR presents an "80/20" solution: solving 80% of the cases we have of too large policies (that we don't de-duplicate based on the Action field, which is the one that's repeated the most often, because of how we structure grant*() methods in CDK), with only 20% of the effort (and thus 20% of the risks as well).

This new GroupingByActionsPrincipal class also allows us to do the migration gradually; as can been seen, messing with this fundamental layer of the CDK results in quite a bit of changes required to our tests. This way, we can do the migration gradually, which means we can be selective when do we cause a diff in the existing tests (and thus customer templates!).

Let me know what you think - I'm eager to get your opinion on this solution!

…size limit

When we generate CodePipelines, we need to add an `sts:AssumeRole` statement for each Action in the pipeline,
and a `Bucket.grantReadWrite()` statement for each region the pipeline is in,
to the policy statement of the pipeline's Role.
For pipelines with many Actions and/or regions,
this makes the policy exceed IAM limit of 10240 bytes.

Extract a new class from the CodePipeline CloudFormation Actions that caches the statements added to a given Principal by the 'Action' field,
and groups the statements with the same 'Actions' by adding elements to the 'Resource' field.
This dramatically reduces the duplication in the statement,
and increases the chances of it being smaller than the limit.
Use this new class in the `Pipeline` construct.

Fixes aws#16244
@skinny85 skinny85 force-pushed the fix/codepipeline-too-big-policy branch from d12193d to f6b47a5 Compare September 2, 2021 19:40
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: f6b47a5
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@njlynch njlynch added the pr/requires-two-approvers This PR is critical (e.g., security, broadly-impacting) and requires 2 approvers to be merged. label Sep 3, 2021
@njlynch
Copy link
Contributor

njlynch commented Sep 3, 2021

I've labeled this as requires-two-approvers as I know we've been super-hesitant to touch policy optimization in the past, and I'd like to ensure we have multiple sets of eyes on this.

For what it's worth, I did some archeology, as the taboo around policy optimization predates my time on the team and I wanted to know more. Working from most recent to the original issue:

So it appears the Sun to our folklore Icarus was collapsing actions, and potentially also combining non-identical (but overlapping) statements. This more constrained approach -- only combining 100% identical statements that only differ by resources -- seems much less likely to cause unintentional bugs.

Copy link
Contributor

@njlynch njlynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, largely just missing tests.

In general, tests for the GroupingByActionsPrincipal seems necessary. Additionally, I'm a bit surprised to see zero updates to any integ or unit tests for the updates to the Bucket permissions. Given that we're largely de-duping two sets of actions (sts:AssumeRole and the bucket.grantReadWrite calls), I expect to see some test somewhere showing that behavior. I would think (just based on filename) the integ.pipeline-cfn-cross-region.expected.json test would have seen this update. Even a unit tests verifying the permissions would be useful.

"Arn"
]
}
"Resource": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised by this change. What about the implementation is overriding the _norm behavior of PolicyStatement.toStatementJson?

public toStatementJson(): any {
return noUndef({
Action: _norm(this.action, { unique: true }),
NotAction: _norm(this.notAction, { unique: true }),
Condition: _norm(this.condition),
Effect: _norm(this.effect),
Principal: _normPrincipal(this.principal),
NotPrincipal: _normPrincipal(this.notPrincipal),
Resource: _norm(this.resource, { unique: true }),
NotResource: _norm(this.notResource, { unique: true }),
Sid: _norm(this.sid),
});
function _norm(values: any, { unique }: { unique: boolean } = { unique: false }) {
if (typeof(values) === 'undefined') {
return undefined;
}
if (cdk.Token.isUnresolved(values)) {
return values;
}
if (Array.isArray(values)) {
if (!values || values.length === 0) {
return undefined;
}
if (values.length === 1) {
return values[0];
}

My understanding is that any time we have a single resource, we remove the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Yeah, this is surprising behavior.

Let me dig in a little bit more to see what happens here.

* thus compressing the overall size of the Policy attached the given Principal,
* and making it less likely it hits the 10240 bytes IAM limit.
*/
export class GroupingByActionsPrincipal extends Construct implements IPrincipal, IGrantable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! Where are the tests for this new class?

const key = this.keyFor(statement);
const statementCache = this.statements[key];
if (!statementCache) {
const grantResult = this.wrappedIdentity.addToPrincipalPolicy(statement);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that there are existing statements on the wrappedIdentity that would be duplicated here, correct? If so, we simply wouldn't de-dupe those statements, but would any future statements that matched. If so, I think that's fair for now, but could be a future improvement.

@skinny85
Copy link
Contributor Author

skinny85 commented Sep 3, 2021

Thanks for the review @njlynch, and for the archeological digs! I'd be happy to add more tests, I'm glad the approach looks sensible in your estimation 🙂.

@rix0rrr could I get your opinion here?

}

private keyFor(statement: PolicyStatement): string {
const hashBuilder = crypto.createHash('sha256');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why even create a hash in the first place? We can just use the stringified JSON as the key

private readonly statements: { [key: string]: { statement: PolicyStatement, grantResult: AddToPrincipalPolicyResult } };

constructor(grantPrincipal: IIdentity, id: string) {
super(grantPrincipal as unknown as Construct, id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast looks icky. Why this?

Does GroupingByActionsPrincipal need to be a Construct in the first place?

this.statements[key] = { statement, grantResult };
return grantResult;
} else {
statementCache.statement.addResources(...statement.resources);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! Since statementCache.statement is the same object as the original PolicyStatement given to you by the consumer, mutation will have unintended side effects:

const stmt = new PolicyStatement({ actions: ['s3:GetBanana'], resources: ['mybucket'] });

groupingByActionsPrincipal.addToPrincipalPolicy(stmt);
groupingByActionsPrincipal.addToPrincipalPolicy({ actions: ['s3:GetBanana'], resources: ['*'] });

stmt.addAction('aws:StealCredentials');  // Oopsepoops! End result is 'aws:StealCredentials' on '*' !

Only safe way around this is by making a copy of the incoming PolicyStatement to prevent the mutable aliasing.

That may break the legitimate use cases of mutating the original policy after the fact...

So probably the only way to do this correctly is to do the merging at render time.

@AhmadDaoud
Copy link

This is impacting us at NVIDIA too. Besides refactoring the pipeline role policy, will you be supporting importing an existing role or allowing us to pass an immutable policy?

@skinny85
Copy link
Contributor Author

This is impacting us at NVIDIA too. Besides refactoring the pipeline role policy, will you be supporting importing an existing role or allowing us to pass an immutable policy?

Yes, we already support that - you can simply pass your own immutable Role when creating the Pipeline construct.

@akashv-builder
Copy link

Hey, what is the workaround for CodePipeline (construct)

How to override the role property in this case as it does not expose any role property?

// Modern API
const modernPipeline = new pipelines.CodePipeline(this, 'Pipeline', {
selfMutation: false,
synth: new pipelines.ShellStep('Synth', {
input: pipelines.CodePipelineSource.connection('my-org/my-app', 'main', {
connectionArn: 'arn:aws:codestar-connections:us-east-1:222222222222:connection/7d2469ff-514a-4e4f-9003-5ca4a43cdc41', // Created using the AWS console * });',
}),
commands: [
'npm ci',
'npm run build',
'npx cdk synth',
],
}),
});

rix0rrr added a commit that referenced this pull request Feb 23, 2022
The policies we generate sometimes have a lot of duplication between
statements. This duplication can lead to the policy going over the size
limit an IAM policy (either 2k, 6k or 10k bytes, depending on the resource
type).

This change combines multiple statements together, as long as it
doesn't change the meaning of the final policy.

Because doing so for all existing stacks will probably provoke minor
heart attacks in operators everywhere, the new behavior is gated
behind a feature flag. It can be retroactively switched on by
people currently being bit by the size issues:

```
@aws-cdk/aws-iam:minimizePolicies
```

Fixes #18774, fixes #16350, fixes #18457.
@rix0rrr rix0rrr removed contribution/core This is a PR that came from AWS. pr/requires-two-approvers This PR is critical (e.g., security, broadly-impacting) and requires 2 approvers to be merged. labels Mar 4, 2022
@rix0rrr rix0rrr added bug This issue is a bug. p1 labels Mar 4, 2022
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Mar 4, 2022
@blimmer
Copy link
Contributor

blimmer commented Mar 11, 2022

It seems like #19114 might supersede this PR. It's being actively worked on. Just in case others are looking through the various issues related to this.

@mergify mergify bot closed this in #19114 Mar 18, 2022
mergify bot pushed a commit that referenced this pull request Mar 18, 2022
The policies we generate sometimes have a lot of duplication between
statements. This duplication can lead to the policy going over the size
limit an IAM policy (either 2k, 6k or 10k bytes, depending on the resource
type).

This change combines multiple statements together, as long as it
doesn't change the meaning of the final policy.

Because doing so for all existing stacks will probably provoke minor
heart attacks in operators everywhere, the new behavior is gated
behind a feature flag. It can be retroactively switched on by
people currently being bit by the size issues:

```
@aws-cdk/aws-iam:minimizePolicies
```

We will merge 2 statements if their effects are the same, and they are otherwise exactly the same apart from their `Action`, `Resource` or `Principal` declarations. We will not merge `NotXxx` statements, because doing so will change the meaning of the statement (`not A or not B ≠ not (A or B)`). There may be multiple possible merges that apply and we are not guaranteed to find the smallest merging, nor do we take effort to find all possible merges and do simplifications like `*`-subsumption. This is a starting point that should help out in the common case.

Fixes #18774, fixes #16350, fixes #18457, fixes #18564, fixes #19276.


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
iliapolo pushed a commit that referenced this pull request Mar 20, 2022
The policies we generate sometimes have a lot of duplication between
statements. This duplication can lead to the policy going over the size
limit an IAM policy (either 2k, 6k or 10k bytes, depending on the resource
type).

This change combines multiple statements together, as long as it
doesn't change the meaning of the final policy.

Because doing so for all existing stacks will probably provoke minor
heart attacks in operators everywhere, the new behavior is gated
behind a feature flag. It can be retroactively switched on by
people currently being bit by the size issues:

```
@aws-cdk/aws-iam:minimizePolicies
```

We will merge 2 statements if their effects are the same, and they are otherwise exactly the same apart from their `Action`, `Resource` or `Principal` declarations. We will not merge `NotXxx` statements, because doing so will change the meaning of the statement (`not A or not B ≠ not (A or B)`). There may be multiple possible merges that apply and we are not guaranteed to find the smallest merging, nor do we take effort to find all possible merges and do simplifications like `*`-subsumption. This is a starting point that should help out in the common case.

Fixes #18774, fixes #16350, fixes #18457, fixes #18564, fixes #19276.


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@sozonnyk
Copy link

sozonnyk commented May 16, 2022

@blimmer @rix0rrr @skinny85
Why this was closed?

This issue was not fixed by #19114

We are awaiting for this to fix #19939

@sozonnyk
Copy link

Apologies guys, it seems I confused this PR with #20189 which is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aws-codepipeline: cross region support creates huge inline policy document for the pipeline role
8 participants