Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stepfunctions): distributed map construct #28821

Merged
merged 14 commits into from
Feb 9, 2024
Merged

Conversation

abdelnn
Copy link
Contributor

@abdelnn abdelnn commented Jan 23, 2024

Adds support for Step Functions Map state in Distributed mode. Currently, in order to create a Distributed Map in CDK, CDK users have to define a Custom State containing their Amazon States Language definition.

This solution consists of the creation of a new L2 construct, DistributedMap. This design decision was made due to the fact that some fields are exclusive to Distributed Maps, such as ItemReader. Adding support for it through the existing Map L2 construct would lead to some fields being conditionally available.

Some design decisions that were made:

  • I created an abstract class MapBase that encapsulates all fields currently supported by both inline and distributed maps. This includes all currently supported fields in the CDK except for iterator and parameters (deprecated fields). Those are now part of the Map subclass which extends MapBase. All new Distributed Maps fields are part of the new DistributedMap construct (also a subclass of MapBase)
  • Permissions specific to Distributed Maps are added as part of this new construct

Thanks to @beck3905 and their PR #24331 for inspiration. A lot of the ideas here are re-used from the PR cited.

Closes #23216


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 labels Jan 23, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team January 23, 2024 01:48
@github-actions github-actions bot added the beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK label Jan 23, 2024
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Jan 23, 2024
@dominhhai
Copy link

IMO, the ItemBatcher should be configurable

abdelnn and others added 3 commits January 24, 2024 09:31
Co-authored-by: Roman <491247+moltar@users.noreply.github.com>
@abdelnn
Copy link
Contributor Author

abdelnn commented Jan 24, 2024

IMO, the ItemBatcher should be configurable

I agree, I've changed it so that it is. Thanks for the feedback!

@kaizencc kaizencc changed the title feat(stepfunctions): add distributed map construct feat(stepfunctions): distributed map construct Feb 1, 2024
@kaizencc
Copy link
Contributor

kaizencc commented Feb 1, 2024

I'm going to continue looking at this tomorrow. I haven't finished my review but there were some style changes that I wanted to get in first.

Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I started on a preliminary review. This is going to take a while though, since it is quite involved.

*
* @see https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html
*/
export abstract class MapBase extends State implements INextable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the motivation behind moving MapBase out of Map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, I wanted to do two things:

  • Implement a new Distributed Maps construct
  • Ensure that this new construct does not support deprecated ASL fields.

The way I decided to go about this was to abstract the common properties of both distributed and inline maps, and only support deprecated ASL fields (parameters and iterator) in the existing Map construct for backwards compatibility concerns.

Two other alternatives that I briefly considered but ultimately rejected:

  • Adding to the existing Map construct the required Distributed Map fields. This was rejected as having fields be conditionally available based on what kind of map it is seemed unnecessarily convoluted
  • Extending the existing Map construct to create the Distributed Map one. This was rejected as the new construct would contain deprecated ASL fields

packages/aws-cdk-lib/aws-stepfunctions/lib/states/state.ts Outdated Show resolved Hide resolved
@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 2, 2024
Co-authored-by: Kaizen Conroy <36202692+kaizencc@users.noreply.github.com>
@mergify mergify bot dismissed kaizencc’s stale review February 2, 2024 19:05

Pull request has been modified.

@abdelnn
Copy link
Contributor Author

abdelnn commented Feb 2, 2024

Yes doing this will have some backwards compatibility concerns we have to skirt

That shouldn't be the case anymore though, right? We're now correctly returning the Map type just like we were before

@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 2, 2024
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for the hefty PR @abdelnn. lets give it a try...

Copy link
Contributor

mergify bot commented Feb 9, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Feb 9, 2024
Copy link
Contributor

mergify bot commented Feb 9, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 169c6dd
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 97e3827 into aws:main Feb 9, 2024
9 checks passed
Copy link
Contributor

mergify bot commented Feb 9, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

TheRealAmazonKendra pushed a commit that referenced this pull request Feb 9, 2024
Adds support for Step Functions Map state in Distributed mode. Currently, in order to create a Distributed Map in CDK, CDK users have to define a Custom State containing their Amazon States Language definition. 

This solution consists of the creation of a new L2 construct, `DistributedMap`. This design decision was made due to the fact that some fields are exclusive to Distributed Maps, such as `ItemReader`. Adding support for it through the existing `Map` L2 construct would lead to some fields being conditionally available.

Some design decisions that were made:
- I created an abstract class `MapBase` that encapsulates all fields currently supported by both `inline` and `distributed` maps. This includes all currently supported fields in the CDK except for `iterator` and `parameters` (deprecated fields). Those are now part of the Map subclass which extends `MapBase`. All new Distributed Maps fields are part of the new `DistributedMap` construct (also a subclass of `MapBase`)
- Permissions specific to Distributed Maps are added as part of this new construct

Thanks to @beck3905 and their PR #24331 for inspiration. A lot of the ideas here are re-used from the PR cited.

Closes #23216 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
graph = new StateGraph(definitionBody.chainable.startState, 'State Machine definition');
graph.timeout = props.timeout;
for (const statement of graph.policyStatements) {
this.addToRolePolicy(statement);
Copy link
Contributor

@orekav orekav Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdelnn is it possible that this is adding the role policies on the wrong place?

#29203

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as I remove this for and put it back where it was before (Only the for).
The bug disappears.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're right, this should be

this.role.addToPrincipalPolicy(statement)

@abdelnn abdelnn deleted the distributedmap branch March 7, 2024 18:04
mergify bot pushed a commit that referenced this pull request Jun 13, 2024
… executionType in the ProcessorConfig (#30301)

### Issue # (if applicable)

Closes #30194 

### Reason for this change
In #27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in #28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
sarangarav pushed a commit to sarangarav/aws-cdk that referenced this pull request Jun 21, 2024
… executionType in the ProcessorConfig (aws#30301)

### Issue # (if applicable)

Closes aws#30194 

### Reason for this change
In aws#27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in aws#28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mazyu36 added a commit to mazyu36/aws-cdk that referenced this pull request Jun 22, 2024
… executionType in the ProcessorConfig (aws#30301)

### Issue # (if applicable)

Closes aws#30194 

### Reason for this change
In aws#27913, the ItemProcessor was introduced for use with the Map Class. With the executionType in the ProcessorConfig, it was possible to specify the executionType for the Map.
On the other hand, in aws#28821, the DistributedMap Class was introduced. The mapExecutionType of the DistributedMap class always overwrites the executionType of the ProcessorConfig.
Therefore, when using the DistributedMap class, the implementation ignores the executionType of the ProcessorConfig. However, this behavior cannot be inferred from the documentation.

### Description of changes
* Added to the docs that when using the DistributedMap Class, the executionType in the ProcessorConfig is ignored.
* Also added a warning.


### Description of how you validated changes
Add unit tests and integ tests



### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(stepfunctions): Add native support for DistributedMap
6 participants