Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-kinesisanalyticsv2: L2 construct for Flink applications #12407

Closed
1 of 2 tasks
mitchlloyd opened this issue Jan 7, 2021 · 3 comments
Closed
1 of 2 tasks

aws-kinesisanalyticsv2: L2 construct for Flink applications #12407

mitchlloyd opened this issue Jan 7, 2021 · 3 comments
Assignees
Labels
@aws-cdk/aws-kinesisanalytics Related to Amazon Kinesis Data Analytics effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on. p1

Comments

@mitchlloyd
Copy link
Contributor

mitchlloyd commented Jan 7, 2021

Following #7535, add an L2 construct for building AWS:KinesisAnalyticsV2::Applications that use Flink. A README follows to start the design discussion:


The @aws-cdk/aws-kinesisanalyticsv2 package provides constructs for creating Kinesis Data Analytics applications.

For more information, see the AWS documentation for Kinesis Data Analytics.

Creating a Flink Kinesis Analytics Application

To create a new Flink application, use the FlinkApplication construct.

import * as ka from '@aws-cdk/aws-kinesisanalyticsv2';

const flinkApp = new ka.FlinkApplication(this, 'Application', {
  code: ka.ApplicationCode.fromBucket(bucket, 'my-app.jar'),
  runtime: ka.FlinkRuntime.FLINK_1_11,
});

The code property can use fromBucket as shown above to reference a jar file in s3 or fromAsset to reference a local file.

import * as ka from '@aws-cdk/aws-kinesisanalyticsv2';
import * as path from 'path';

const flinkApp = new ka.FlinkApplication(this, 'Application', {
  code: ka.ApplicationCode.fromAsset(path.join(__dirname, 'my-app.jar')),
  // ...
});

The propertyGroups property provides a way of passing arbitrary runtime configuration to your Flink application. You can use the aws-kinesisanalytics-runtime library to retrieve these properties.

import * as ka from '@aws-cdk/aws-kinesisanalyticsv2';
import * as path from 'path';

const flinkApp = new ka.FlinkApplication(this, 'Application', {
  // ...
  propertyGroups: [
    new ka.PropertyGroup('FlinkApplicationProperties', {
      inputStreamName: 'my-input-kinesis-stream',
      outputStreamName: 'my-output-kinesis-stream',
    }),
  ],
});

Flink applications also have specific properties for passing parameters when the Flink job starts. These include parameters for checkpointing, snapshotting, monitoring, and parallelism.

import * as ka from '@aws-cdk/aws-kinesisanalyticsv2';

const flinkApp = new ka.FlinkApplication(this, 'Application', {
  code: ka.ApplicationCode.fromBucket(bucket, 'my-app.jar'),
  runtime: ka.FlinkRuntime.FLINK_1_11,
  checkpointingEnabled: true, // default is true
  checkpointInterval: cdk.Duration.seconds(30), // default is 1 minute
  minPausesBetweenCheckpoints: cdk.Duration.seconds(10), // default is 5 seconds
  logLevel: ka.LogLevel.ERROR, // default is INFO
  metricsLevel: ka.MetricsLevel.PARALLELISM, // default is APPLICATION
  autoScalingEnabled: false, // default is true
  parallelism: 32, // default is 1
  parallelismPerKpu: 2, // default is 1
  snapshotsEnabled: false, // default is true
});

Creating an SQL Kinesis Analytics Application

Constructs for SQL applications are not implemented yet.


Notes for review:

Decisions:

  1. Flink and SQL applications share almost no properties so having separate ka.FlinkApplication and ka.SqlApplication constructs seems correct. I can't see why these would even share an abstract base class.
  2. I'm trying to focus on shipping the Flink construct before SQL since they are so different and I haven't used an SQL application.
  3. I unested lots of the configuration for discoverability. The Cfn naming is verbose (usually with prefixes) so collisions are unlikely.

This is a pretty good example of using CDK today to build a Flink app.

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

cc @iliapolo


This is a 🚀 Feature Request

@mitchlloyd mitchlloyd added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jan 7, 2021
@iliapolo iliapolo self-assigned this Jan 11, 2021
@iliapolo iliapolo added effort/large Large work item – several weeks of effort p1 @aws-cdk/aws-kinesisanalytics Related to Amazon Kinesis Data Analytics and removed needs-triage This issue or PR still needs to be triaged. labels Jan 11, 2021
@iliapolo
Copy link
Contributor

@mitchlloyd Looks cool. Can I ask you though to move this theoretical README to a PR? This way we can have inline discussions.

Thanks!

@mitchlloyd mitchlloyd changed the title aws-kinesisanalyticsv2: L2 construct for Flink applciations aws-kinesisanalyticsv2: L2 construct for Flink applications Jan 11, 2021
@iliapolo iliapolo added the in-progress This issue is being actively worked on. label Jan 17, 2021
mergify bot pushed a commit that referenced this issue Feb 9, 2021
…2464)

Opened for #12407.

Notes for review:

1. Flink and SQL applications share almost no properties so having separate ka.FlinkApplication and ka.SqlApplication constructs seems correct. I can't see why these would even share an abstract base class.
1. I'm trying to focus on shipping the Flink construct before SQL since they are so different and I haven't used an SQL application.
1. I unested lots of the configuration for discoverability. The Cfn naming is verbose (usually with prefixes) so collisions are unlikely. [This](https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl/blob/master/cdk/lib/streaming-etl.ts#L100) is a pretty good example of using CDK today to build a Flink app.

Running List of Open Questions
-------------------------------

1. ~aws-kinesisanalytics exports both `aws-kinesisanalytics` and `aws-kinesisanalyticsv2` generated code. How should we resolve this? Currently I've exported `aws-kinesisanalyticsv2` from `aws-kinesisanalyticsv2` and haven't changed the other package.~ Kept this package isolated from aws-kinesisanalytics at the expense of duplicate generated code. 
2. ~I'm not confident with the use cases for the `fromAttributes` factory. I'd prefer to leave this factory method out in this initial PR if possible, but I'm also open to comments about what use cases this should handle.~
3. ~All logging options could be flattened into FlinkApplicationProps (e.g. logRentention, logGroupName, logStreamName, logEncryptionKey...). My first thought was to provide a new `ka.Logging` type that can be passed to customize the logGroup and logStream. There may be some other middle ground that could let users pass in a logGroup and then a logStream.~ Went with the flat list of props added to FlinkApplicationProps
4. ~When I run `yarn build` I get this error even though I've already tagged the class.~
    Was missing a `gen` yarn script entry.
5. ~Should this module become `aws-flink-application` or `aws-kinesisanalytics-flink` to side step the confusion of having seemingly unrelated constructs in the same module at the expense of clashing with CloudFormation?~ New module name: `aws-kinesisanalytics-flink`.

Todo
----

- [x] Inline documentation
- [x] Add property groups
- [x] checkpointEnabled
- [x] minPauseBetweenCheckpoints
- [x] logLevel
- [x] metricsLevel
- [x] autoscalingEnabled
- [x] parallelism
- [x] parallelismPerKpu
- [x] snapshotsEnabled
- [x] Figure out fromAttributes approach
- [x] Add log stream
- [x] Decide on logging options approach
- [x] Add metrics access
- [x] Validate application name
- [x] Integration tests

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@mitchlloyd
Copy link
Contributor Author

Resolved with #12464

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

NovakGu pushed a commit to NovakGu/aws-cdk that referenced this issue Feb 18, 2021
…s#12464)

Opened for aws#12407.

Notes for review:

1. Flink and SQL applications share almost no properties so having separate ka.FlinkApplication and ka.SqlApplication constructs seems correct. I can't see why these would even share an abstract base class.
1. I'm trying to focus on shipping the Flink construct before SQL since they are so different and I haven't used an SQL application.
1. I unested lots of the configuration for discoverability. The Cfn naming is verbose (usually with prefixes) so collisions are unlikely. [This](https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl/blob/master/cdk/lib/streaming-etl.ts#L100) is a pretty good example of using CDK today to build a Flink app.

Running List of Open Questions
-------------------------------

1. ~aws-kinesisanalytics exports both `aws-kinesisanalytics` and `aws-kinesisanalyticsv2` generated code. How should we resolve this? Currently I've exported `aws-kinesisanalyticsv2` from `aws-kinesisanalyticsv2` and haven't changed the other package.~ Kept this package isolated from aws-kinesisanalytics at the expense of duplicate generated code. 
2. ~I'm not confident with the use cases for the `fromAttributes` factory. I'd prefer to leave this factory method out in this initial PR if possible, but I'm also open to comments about what use cases this should handle.~
3. ~All logging options could be flattened into FlinkApplicationProps (e.g. logRentention, logGroupName, logStreamName, logEncryptionKey...). My first thought was to provide a new `ka.Logging` type that can be passed to customize the logGroup and logStream. There may be some other middle ground that could let users pass in a logGroup and then a logStream.~ Went with the flat list of props added to FlinkApplicationProps
4. ~When I run `yarn build` I get this error even though I've already tagged the class.~
    Was missing a `gen` yarn script entry.
5. ~Should this module become `aws-flink-application` or `aws-kinesisanalytics-flink` to side step the confusion of having seemingly unrelated constructs in the same module at the expense of clashing with CloudFormation?~ New module name: `aws-kinesisanalytics-flink`.

Todo
----

- [x] Inline documentation
- [x] Add property groups
- [x] checkpointEnabled
- [x] minPauseBetweenCheckpoints
- [x] logLevel
- [x] metricsLevel
- [x] autoscalingEnabled
- [x] parallelism
- [x] parallelismPerKpu
- [x] snapshotsEnabled
- [x] Figure out fromAttributes approach
- [x] Add log stream
- [x] Decide on logging options approach
- [x] Add metrics access
- [x] Validate application name
- [x] Integration tests

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
eladb pushed a commit to cdklabs/decdk that referenced this issue Jan 18, 2022
…2464)

Opened for aws/aws-cdk#12407.

Notes for review:

1. Flink and SQL applications share almost no properties so having separate ka.FlinkApplication and ka.SqlApplication constructs seems correct. I can't see why these would even share an abstract base class.
1. I'm trying to focus on shipping the Flink construct before SQL since they are so different and I haven't used an SQL application.
1. I unested lots of the configuration for discoverability. The Cfn naming is verbose (usually with prefixes) so collisions are unlikely. [This](https://github.com/aws-samples/amazon-kinesis-analytics-streaming-etl/blob/master/cdk/lib/streaming-etl.ts#L100) is a pretty good example of using CDK today to build a Flink app.

Running List of Open Questions
-------------------------------

1. ~aws-kinesisanalytics exports both `aws-kinesisanalytics` and `aws-kinesisanalyticsv2` generated code. How should we resolve this? Currently I've exported `aws-kinesisanalyticsv2` from `aws-kinesisanalyticsv2` and haven't changed the other package.~ Kept this package isolated from aws-kinesisanalytics at the expense of duplicate generated code. 
2. ~I'm not confident with the use cases for the `fromAttributes` factory. I'd prefer to leave this factory method out in this initial PR if possible, but I'm also open to comments about what use cases this should handle.~
3. ~All logging options could be flattened into FlinkApplicationProps (e.g. logRentention, logGroupName, logStreamName, logEncryptionKey...). My first thought was to provide a new `ka.Logging` type that can be passed to customize the logGroup and logStream. There may be some other middle ground that could let users pass in a logGroup and then a logStream.~ Went with the flat list of props added to FlinkApplicationProps
4. ~When I run `yarn build` I get this error even though I've already tagged the class.~
    Was missing a `gen` yarn script entry.
5. ~Should this module become `aws-flink-application` or `aws-kinesisanalytics-flink` to side step the confusion of having seemingly unrelated constructs in the same module at the expense of clashing with CloudFormation?~ New module name: `aws-kinesisanalytics-flink`.

Todo
----

- [x] Inline documentation
- [x] Add property groups
- [x] checkpointEnabled
- [x] minPauseBetweenCheckpoints
- [x] logLevel
- [x] metricsLevel
- [x] autoscalingEnabled
- [x] parallelism
- [x] parallelismPerKpu
- [x] snapshotsEnabled
- [x] Figure out fromAttributes approach
- [x] Add log stream
- [x] Decide on logging options approach
- [x] Add metrics access
- [x] Validate application name
- [x] Integration tests

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-kinesisanalytics Related to Amazon Kinesis Data Analytics effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on. p1
Projects
None yet
Development

No branches or pull requests

2 participants