diff --git a/packages/@aws-cdk/aws-glue-alpha/README.md b/packages/@aws-cdk/aws-glue-alpha/README.md index 051044a74c8ff..7e8a830851c74 100644 --- a/packages/@aws-cdk/aws-glue-alpha/README.md +++ b/packages/@aws-cdk/aws-glue-alpha/README.md @@ -17,119 +17,374 @@ This module is part of the [AWS Cloud Development Kit](https://github.com/aws/aws-cdk) project. -## Job +## README + +[AWS Glue](https://aws.amazon.com/glue/) is a serverless data integration +service that makes it easier to discover, prepare, move, and integrate data +from multiple sources for analytics, machine learning (ML), and application +development. + +Without an L2 construct, developers define Glue data sources, connections, +jobs, and workflows for their data and ETL solutions via the AWS console, +the AWS CLI, and Infrastructure as Code tools like CloudFormation and the +CDK. However, there are several challenges to defining Glue resources at +scale that an L2 construct can resolve. First, developers must reference +documentation to determine the valid combinations of job type, Glue version, +worker type, language versions, and other parameters that are required for specific +job types. Additionally, developers must already know or look up the +networking constraints for data source connections, and there is ambiguity +around how to securely store secrets for JDBC connections. Finally, +developers want prescriptive guidance via best practice defaults for +throughput parameters like number of workers and batching. + +The Glue L2 construct has convenience methods working backwards from common +use cases and sets required parameters to defaults that align with recommended +best practices for each job type. It also provides customers with a balance +between flexibility via optional parameter overrides, and opinionated +interfaces that discouraging anti-patterns, resulting in reduced time to develop +and deploy new resources. + +### References + +* [Glue Launch Announcement](https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/) +* [Glue Documentation](https://docs.aws.amazon.com/glue/index.html) +* [Glue L1 (CloudFormation) Constructs](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/AWS_Glue.html) +* Prior version of the [@aws-cdk/aws-glue-alpha module](https://github.com/aws/aws-cdk/blob/v2.51.1/packages/%40aws-cdk/aws-glue/README.md) + +## Create a Glue Job + +A Job encapsulates a script that connects to data sources, processes +them, and then writes output to a data target. There are four types of Glue +Jobs: Spark (ETL and Streaming), Python Shell, Ray, and Flex Jobs. Most +of the required parameters for these jobs are common across all types, +but there are a few differences depending on the languages supported +and features provided by each type. For all job types, the L2 defaults +to AWS best practice recommendations, such as: + +* Use of Secrets Manager for Connection JDBC strings +* Glue job autoscaling +* Default parameter values for Glue job creation + +This iteration of the L2 construct introduces breaking changes to +the existing glue-alpha-module, but these changes streamline the developer +experience, introduce new constants for defaults, and replacing synth-time +validations with interface contracts for enforcement of the parameter combinations +that Glue supports. As an opinionated construct, the Glue L2 construct does +not allow developers to create resources that use non-current versions +of Glue or deprecated language dependencies (e.g. deprecated versions of Python). +As always, L1s allow you to specify a wider range of parameters if you need +or want to use alternative configurations. + +Optional and required parameters for each job are enforced via interface +rather than validation; see [Glue's public documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html) +for more granular details. -A `Job` encapsulates a script that connects to data sources, processes them, and then writes output to a data target. +### Spark Jobs -There are 3 types of jobs supported by AWS Glue: Spark ETL, Spark Streaming, and Python Shell jobs. +1. **ETL Jobs** -The `glue.JobExecutable` allows you to specify the type of job, the language to use and the code assets required by the job. +ETL jobs support pySpark and Scala languages, for which there are separate but +similar constructors. ETL jobs default to the G2 worker type, but you can +override this default with other supported worker type values (G1, G2, G4 +and G8). ETL jobs defaults to Glue version 4.0, which you can override to 3.0. +The following ETL features are enabled by default: +`—enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log.` +You can find more details about version, worker type and other features in +[Glue's public documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html). -`glue.Code` allows you to refer to the different code assets required by the job, either from an existing S3 location or from a local file path. +Reference the pyspark-etl-jobs.test.ts and scalaspark-etl-jobs.test.ts unit tests +for examples of required-only and optional job parameters when creating these +types of jobs. -`glue.ExecutionClass` allows you to specify `FLEX` or `STANDARD`. `FLEX` is appropriate for non-urgent jobs such as pre-production jobs, testing, and one-time data loads. +For the sake of brevity, examples are shown using the pySpark job variety. -### Spark Jobs +Example with only required parameters: +```ts +new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', +}); +``` -These jobs run in an Apache Spark environment managed by AWS Glue. +Example with optional override parameters: +```ts +new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); +``` -#### ETL Jobs +**Streaming Jobs** -An ETL job processes data in batches using Apache Spark. +Streaming jobs are similar to ETL jobs, except that they perform ETL on data +streams using the Apache Spark Structured Streaming framework. Some Spark +job features are not available to Streaming ETL jobs. They support Scala +and pySpark languages. PySpark streaming jobs default Python 3.9, +which you can override with any non-deprecated version of Python. It +defaults to the G2 worker type and Glue 4.0, both of which you can override. +The following best practice features are enabled by default: +`—enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log`. +Reference the pyspark-streaming-jobs.test.ts and scalaspark-streaming-jobs.test.ts +unit tests for examples of required-only and optional job parameters when creating +these types of jobs. + +Example with only required parameters: ```ts -declare const bucket: s3.Bucket; -new glue.Job(this, 'ScalaSparkEtlJob', { - executable: glue.JobExecutable.scalaEtl({ - glueVersion: glue.GlueVersion.V4_0, - script: glue.Code.fromBucket(bucket, 'src/com/example/HelloWorld.scala'), - className: 'com.example.HelloWorld', - extraJars: [glue.Code.fromBucket(bucket, 'jars/HelloWorld.jar')], - }), - workerType: glue.WorkerType.G_8X, - description: 'an example Scala ETL job', +new glue.PySparkStreamingJob(stack, 'ImportedJob', { role, script }); +``` + +Example with optional override parameters: +```ts +new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + jobName: 'PySparkStreamingJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, }); ``` -#### Streaming Jobs +**Flex Jobs** -A Streaming job is similar to an ETL job, except that it performs ETL on data streams. It uses the Apache Spark Structured Streaming framework. Some Spark job features are not available to streaming ETL jobs. +The flexible execution class is appropriate for non-urgent jobs such as +pre-production jobs, testing, and one-time data loads. Flexible jobs default +to Glue version 3.0 and worker type `G_2X`. The following best practice +features are enabled by default: +`—enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log` + +Reference the pyspark-flex-etl-jobs.test.ts and scalaspark-flex-etl-jobs.test.ts +unit tests for examples of required-only and optional job parameters when creating +these types of jobs. + +Example with only required parameters: +```ts +job = new glue.PySparkFlexEtlJob(stack, 'ImportedJob', { role, script }); +``` +Example with optional override parameters: ```ts -new glue.Job(this, 'PythonSparkStreamingJob', { - executable: glue.JobExecutable.pythonStreaming({ - glueVersion: glue.GlueVersion.V4_0, - pythonVersion: glue.PythonVersion.THREE, - script: glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')), - }), - description: 'an example Python Streaming job', +new glue.PySparkEtlJob(stack, 'pySparkEtlJob', { + jobName: 'pySparkEtlJob', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, }); ``` ### Python Shell Jobs -A Python shell job runs Python scripts as a shell and supports a Python version that depends on the AWS Glue version you are using. -This can be used to schedule and run tasks that don't require an Apache Spark environment. Currently, three flavors are supported: +Python shell jobs support a Python version that depends on the AWS Glue +version you use. These can be used to schedule and run tasks that don't +require an Apache Spark environment. Python shell jobs default to +Python 3.9 and a MaxCapacity of `0.0625`. Python 3.9 supports pre-loaded +analytics libraries using the `library-set=analytics` flag, which is +enabled by default. + +Reference the pyspark-shell-job.test.ts unit tests for examples of +required-only and optional job parameters when creating these types of jobs. -* PythonVersion.TWO (2.7; EOL) -* PythonVersion.THREE (3.6) -* PythonVersion.THREE_NINE (3.9) +Example with only required parameters: +```ts +job = new glue.PythonShellJob(stack, 'ImportedJob', { role, script }); +``` +Example with optional override parameters: ```ts -declare const bucket: s3.Bucket; -new glue.Job(this, 'PythonShellJob', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script: glue.Code.fromBucket(bucket, 'script.py'), - }), - description: 'an example Python Shell job', +new glue.PythonShellJob(stack, 'PythonShellJob', { + jobName: 'PythonShellJobCustomName', + description: 'This is a description', + pythonVersion: glue.PythonVersion.TWO, + maxCapacity: glue.MaxCapacity.DPU_1, + role, + script, + glueVersion: glue.GlueVersion.V2_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, }); ``` ### Ray Jobs -These jobs run in a Ray environment managed by AWS Glue. +Glue Ray jobs use worker type Z.2X and Glue version 4.0. These are not +overrideable since these are the only configuration that Glue Ray jobs +currently support. The runtime defaults to Ray2.4 and min workers defaults to 3. +Reference the ray-job.test.ts unit tests for examples of required-only and +optional job parameters when creating these types of jobs. + +Example with only required parameters: ```ts -new glue.Job(this, 'RayJob', { - executable: glue.JobExecutable.pythonRay({ - glueVersion: glue.GlueVersion.V4_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - script: glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')), - }), +job = new glue.RayJob(stack, 'ImportedJob', { role, script }); +``` + +Example with optional override parameters: +```ts +new glue.RayJob(stack, 'ImportedJob', { + role, + script, + jobName: 'RayCustomJobName', + description: 'This is a description', workerType: glue.WorkerType.Z_2X, - workerCount: 2, - description: 'an example Ray job' + numberOfWorkers: 5, + runtime: glue.Runtime.RAY_TWO_FOUR, + maxRetries: 3, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, }); ``` -### Enable Spark UI +### Enable Job Run Queuing -Enable Spark UI setting the `sparkUI` property. +AWS Glue job queuing monitors your account level quotas and limits. If quotas or limits are insufficient to start a Glue job run, AWS Glue will automatically queue the job and wait for limits to free up. Once limits become available, AWS Glue will retry the job run. Glue jobs will queue for limits like max concurrent job runs per account, max concurrent Data Processing Units (DPU), and resource unavailable due to IP address exhaustion in Amazon Virtual Private Cloud (Amazon VPC). + +Enable job run queuing by setting the `jobRunQueuingEnabled` property to `true`. ```ts -new glue.Job(this, 'EnableSparkUI', { - jobName: 'EtlJobWithSparkUIPrefix', - sparkUI: { - enabled: true, - }, - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script: glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')), - }), -}); +new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + jobRunQueuingEnabled: true + }); ``` -The `sparkUI` property also allows the specification of an s3 bucket and a bucket prefix. +### Uploading scripts from the CDK app repository to S3 + +Similar to other L2 constructs, the Glue L2 automates uploading / updating +scripts to S3 via an optional fromAsset parameter pointing to a script +in the local file structure. You provide the existing S3 bucket and +path to which you'd like the script to be uploaded. -See [documentation](https://docs.aws.amazon.com/glue/latest/dg/add-job.html) for more information on adding jobs in Glue. +Reference the unit tests for examples of repo and S3 code target examples. -## Connection +### Workflow Triggers -A `Connection` allows Glue jobs, crawlers and development endpoints to access certain types of data stores. For example, to create a network connection to connect to a data source within a VPC: +You can use Glue workflows to create and visualize complex +extract, transform, and load (ETL) activities involving multiple crawlers, +jobs, and triggers. Standalone triggers are an anti-pattern, so you must +create triggers from within a workflow using the L2 construct. + +Within a workflow object, there are functions to create different +types of triggers with actions and predicates. You then add those triggers +to jobs. + +StartOnCreation defaults to true for all trigger types, but you can +override it if you prefer for your trigger not to start on creation. + +Reference the workflow-triggers.test.ts unit tests for examples of creating +workflows and triggers. + +1. **On-Demand Triggers** + +On-demand triggers can start glue jobs or crawlers. This construct provides +convenience functions to create on-demand crawler or job triggers. The constructor +takes an optional description parameter, but abstracts the requirement of an +actions list using the job or crawler objects using conditional types. + +1. **Scheduled Triggers** + +You can create scheduled triggers using cron expressions. This construct +provides daily, weekly, and monthly convenience functions, +as well as a custom function that allows you to create your own +custom timing using the [existing event Schedule class](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Schedule.html) +without having to build your own cron expressions. The L2 extracts +the expression that Glue requires from the Schedule object. The constructor +takes an optional description and a list of jobs or crawlers as actions. + +#### **3. Notify Event Triggers** + +There are two types of notify event triggers: batching and non-batching. +For batching triggers, you must specify `BatchSize`. For non-batching +triggers, `BatchSize` defaults to 1. For both triggers, `BatchWindow` +defaults to 900 seconds, but you can override the window to align with +your workload's requirements. + +#### **4. Conditional Triggers** + +Conditional triggers have a predicate and actions associated with them. +The trigger actions are executed when the predicateCondition is true. + +### Connection Properties + +A `Connection` allows Glue jobs, crawlers and development endpoints to access +certain types of data stores. + +***Secrets Management + **You must specify JDBC connection credentials in Secrets Manager and + provide the Secrets Manager Key name as a property to the job connection. + +* **Networking - the CDK determines the best fit subnet for Glue connection +configuration + **The prior version of the glue-alpha-module requires the developer to + specify the subnet of the Connection when it’s defined. Now, you can still + specify the specific subnet you want to use, but are no longer required + to. You are only required to provide a VPC and either a public or private + subnet selection. Without a specific subnet provided, the L2 leverages the + existing [EC2 Subnet Selection](https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_ec2/SubnetSelection.html) + library to make the best choice selection for the subnet. ```ts declare const securityGroup: ec2.SecurityGroup; @@ -531,66 +786,40 @@ new glue.S3Table(this, 'MyTable', { }); ``` -### Primitives - -#### Numeric - -| Name | Type | Comments | -|----------- |---------- |------------------------------------------------------------------------------------------------------------------ | -| FLOAT | Constant | A 32-bit single-precision floating point number | -| INTEGER | Constant | A 32-bit signed value in two's complement format, with a minimum value of -2^31 and a maximum value of 2^31-1 | -| DOUBLE | Constant | A 64-bit double-precision floating point number | -| BIG_INT | Constant | A 64-bit signed INTEGER in two’s complement format, with a minimum value of -2^63 and a maximum value of 2^63 -1 | -| SMALL_INT | Constant | A 16-bit signed INTEGER in two’s complement format, with a minimum value of -2^15 and a maximum value of 2^15-1 | -| TINY_INT | Constant | A 8-bit signed INTEGER in two’s complement format, with a minimum value of -2^7 and a maximum value of 2^7-1 | - -#### Date and time - -| Name | Type | Comments | -|----------- |---------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| DATE | Constant | A date in UNIX format, such as YYYY-MM-DD. | -| TIMESTAMP | Constant | Date and time instant in the UNiX format, such as yyyy-mm-dd hh:mm:ss[.f...]. For example, TIMESTAMP '2008-09-15 03:04:05.324'. This format uses the session time zone. | - -#### String - -| Name | Type | Comments | -|-------------------------------------------- |---------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| STRING | Constant | A string literal enclosed in single or double quotes | -| decimal(precision: number, scale?: number) | Function | `precision` is the total number of digits. `scale` (optional) is the number of digits in fractional part with a default of 0. For example, use these type definitions: decimal(11,5), decimal(15) | -| char(length: number) | Function | Fixed length character data, with a specified length between 1 and 255, such as char(10) | -| varchar(length: number) | Function | Variable length character data, with a specified length between 1 and 65535, such as varchar(10) | - -#### Miscellaneous - -| Name | Type | Comments | -|--------- |---------- |------------------------------- | -| BOOLEAN | Constant | Values are `true` and `false` | -| BINARY | Constant | Value is in binary | - -### Complex - -| Name | Type | Comments | -|------------------------------------- |---------- |------------------------------------------------------------------- | -| array(itemType: Type) | Function | An array of some other type | -| map(keyType: Type, valueType: Type) | Function | A map of some primitive key type to any value type | -| struct(collumns: Column[]) | Function | Nested structure containing individually named and typed collumns | - -## Data Quality Ruleset - -A `DataQualityRuleset` specifies a data quality ruleset with DQDL rules applied to a specified AWS Glue table. For example, to create a data quality ruleset for a given table: - -```ts -new glue.DataQualityRuleset(this, 'MyDataQualityRuleset', { - clientToken: 'client_token', - description: 'description', - rulesetName: 'ruleset_name', - rulesetDqdl: 'ruleset_dqdl', - tags: { - key1: 'value1', - key2: 'value2', - }, - targetTable: new glue.DataQualityTargetTable('database_name', 'table_name'), -}); -``` - -For more information, see [AWS Glue Data Quality](https://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.html). +## Public FAQ + +### What are we launching today? + +We’re launching new features to an AWS CDK Glue L2 Construct to provide +best-practice defaults and convenience methods to create Glue Jobs, Connections, +Triggers, Workflows, and the underlying permissions and configuration. + +### Why should I use this Construct? + +Developers should use this Construct to reduce the amount of boilerplate +code and complexity each individual has to navigate, and make it easier to +create best-practice Glue resources. + +### What’s not in scope? + +Glue Crawlers and other resources that are now managed by the AWS LakeFormation +team are not in scope for this effort. Developers should use existing methods +to create these resources, and the new Glue L2 construct assumes they already +exist as inputs. While best practice is for application and infrastructure code +to be as close as possible for teams using fully-implemented DevOps mechanisms, +in practice these ETL scripts are likely managed by a data science team who +know Python or Scala and don’t necessarily own or manage their own +infrastructure deployments. We want to meet developers where they are, and not +assume that all of the code resides in the same repository, Developers can +automate this themselves via the CDK, however, if they do own both. + +Validating Glue version and feature use per AWS region at synth time is also +not in scope. AWS’ intention is for all features to eventually be propagated to +all Global regions, so the complexity involved in creating and updating region- +specific configuration to match shifting feature sets does not out-weigh the +likelihood that a developer will use this construct to deploy resources to a +region without a particular new feature to a region that doesn’t yet support +it without researching or manually attempting to use that feature before +developing it via IaC. The developer will, of course, still get feedback from +the underlying Glue APIs as CloudFormation deploys the resources similar to the +current CDK L1 Glue experience. diff --git a/packages/@aws-cdk/aws-glue-alpha/awslint.json b/packages/@aws-cdk/aws-glue-alpha/awslint.json index 9555a7aea8590..10a65be504fe6 100644 --- a/packages/@aws-cdk/aws-glue-alpha/awslint.json +++ b/packages/@aws-cdk/aws-glue-alpha/awslint.json @@ -51,15 +51,15 @@ "docs-public-apis:@aws-cdk/aws-glue-alpha.ITable", "docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableArn", "docs-public-apis:@aws-cdk/aws-glue-alpha.ITable.tableName", - "props-default-doc:@aws-cdk/aws-glue-alpha.PythonRayExecutableProps.runtime", - "props-default-doc:@aws-cdk/aws-glue-alpha.PythonShellExecutableProps.runtime", - "props-default-doc:@aws-cdk/aws-glue-alpha.PythonSparkJobExecutableProps.runtime", "docs-public-apis:@aws-cdk/aws-glue-alpha.S3TableProps", - "props-default-doc:@aws-cdk/aws-glue-alpha.ScalaJobExecutableProps.runtime", "docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes", "docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableArn", "docs-public-apis:@aws-cdk/aws-glue-alpha.TableAttributes.tableName", "docs-public-apis:@aws-cdk/aws-glue-alpha.TableBaseProps", - "docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps" + "docs-public-apis:@aws-cdk/aws-glue-alpha.TableProps", + "docs-public-apis:@aws-cdk/aws-glue-alpha.PredicateLogical", + "no-unused-type:@aws-cdk/aws-glue-alpha.ExecutionClass", + "no-unused-type:@aws-cdk/aws-glue-alpha.JobLanguage", + "no-unused-type:@aws-cdk/aws-glue-alpha.JobType" ] } diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/constants.ts b/packages/@aws-cdk/aws-glue-alpha/lib/constants.ts new file mode 100644 index 0000000000000..efeb5d5f09001 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/constants.ts @@ -0,0 +1,308 @@ +/** + * The type of predefined worker that is allocated when a job runs. + * + * If you need to use a WorkerType that doesn't exist as a static member, you + * can instantiate a `WorkerType` object, e.g: `WorkerType.of('other type')` + */ +export enum WorkerType { + /** + * Standard Worker Type + * 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. + */ + STANDARD = 'Standard', + + /** + * G.1X Worker Type + * 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs. + */ + G_1X = 'G.1X', + + /** + * G.2X Worker Type + * 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs. + */ + G_2X = 'G.2X', + + /** + * G.4X Worker Type + * 4 DPU (16 vCPU, 64 GB of memory, 256 GB disk), and provides 1 executor per worker. + * We recommend this worker type for jobs whose workloads contain your most demanding transforms, + * aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later jobs. + */ + G_4X = 'G.4X', + + /** + * G.8X Worker Type + * 8 DPU (32 vCPU, 128 GB of memory, 512 GB disk), and provides 1 executor per worker. We recommend this worker + * type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. + * This worker type is available only for AWS Glue version 3.0 or later jobs. + */ + G_8X = 'G.8X', + + /** + * G.025X Worker Type + * 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for low volume streaming jobs. + */ + G_025X = 'G.025X', + + /** + * Z.2X Worker Type + */ + Z_2X = 'Z.2X', +} + +/** + * The number of workers of a defined workerType that are allocated when a job runs. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html + */ + +/** + * Job states emitted by Glue to CloudWatch Events. + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types for more information. + */ +export enum JobState { + /** + * State indicating job run succeeded + */ + SUCCEEDED = 'SUCCEEDED', + + /** + * State indicating job run failed + */ + FAILED = 'FAILED', + + /** + * State indicating job run timed out + */ + TIMEOUT = 'TIMEOUT', + + /** + * State indicating job is starting + */ + STARTING = 'STARTING', + + /** + * State indicating job is running + */ + RUNNING = 'RUNNING', + + /** + * State indicating job is stopping + */ + STOPPING = 'STOPPING', + + /** + * State indicating job stopped + */ + STOPPED = 'STOPPED', +} + +/** + * The Glue CloudWatch metric type. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html + */ +export enum MetricType { + /** + * A value at a point in time. + */ + GAUGE = 'gauge', + + /** + * An aggregate number. + */ + COUNT = 'count', +} + +/** + * The ExecutionClass whether the job is run with a standard or flexible execution class. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job + * @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html + */ +export enum ExecutionClass { + /** + * The flexible execution class is appropriate for time-insensitive jobs whose start + * and completion times may vary. + */ + FLEX = 'FLEX', + + /** + * The standard execution class is ideal for time-sensitive workloads that require fast job + * startup and dedicated resources. + */ + STANDARD = 'STANDARD', +} + +/** + * AWS Glue version determines the versions of Apache Spark and Python that are available to the job. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html. + */ +export enum GlueVersion { + /** + * Glue version using Spark 2.2.1 and Python 2.7 + */ + V0_9 = '0.9', + + /** + * Glue version using Spark 2.4.3, Python 2.7 and Python 3.6 + */ + V1_0 = '1.0', + + /** + * Glue version using Spark 2.4.3 and Python 3.7 + */ + V2_0 = '2.0', + + /** + * Glue version using Spark 3.1.1 and Python 3.7 + */ + V3_0 = '3.0', + + /** + * Glue version using Spark 3.3.0 and Python 3.10 + */ + V4_0 = '4.0', + +} + +/** + * Runtime language of the Glue job + */ +export enum JobLanguage { + /** + * Scala + */ + SCALA = 'scala', + + /** + * Python + */ + PYTHON = 'python', +} + +/** + * Python version + */ +export enum PythonVersion { + /** + * Python 2 (the exact version depends on GlueVersion and JobCommand used) + */ + TWO = '2', + + /** + * Python 3 (the exact version depends on GlueVersion and JobCommand used) + */ + THREE = '3', + + /** + * Python 3.9 (the exact version depends on GlueVersion and JobCommand used) + */ + THREE_NINE = '3.9', + +} + +/** + * AWS Glue runtime determines the runtime engine of the job. + * + */ +export enum Runtime { + /** + * Runtime for a Glue for Ray 2.4. + */ + RAY_TWO_FOUR = 'Ray2.4', +} + +/** + * The job type. + * + * If you need to use a JobType that doesn't exist as a static member, you + * can instantiate a `JobType` object, e.g: `JobType.of('other name')`. + */ +export enum JobType { + /** + * Command for running a Glue Spark job. + */ + ETL = 'glueetl', + + /** + * Command for running a Glue Spark streaming job. + */ + STREAMING = 'gluestreaming', + + /** + * Command for running a Glue python shell job. + */ + PYTHON_SHELL = 'pythonshell', + + /** + * Command for running a Glue Ray job. + */ + RAY = 'glueray', + +} + +/** + * The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. + */ +export enum MaxCapacity { + + /** + * DPU value of 1/16th + */ + DPU_1_16TH = 0.0625, + + /** + * DPU value of 1 + */ + DPU_1 = 1, +} + +/* + * Represents the logical operator for combining multiple conditions in the Glue Trigger API. + */ +export enum PredicateLogical { + /** + * All conditions must be true for the predicate to be true. + */ + AND = 'AND', + + /** + * At least one condition must be true for the predicate to be true. + */ + ANY = 'ANY', +} + +/** + * Represents the logical operator for evaluating a single condition in the Glue Trigger API. + */ +export enum ConditionLogicalOperator { + /** The condition is true if the values are equal. */ + EQUALS = 'EQUALS', +} + +/** + * Represents the state of a crawler for a condition in the Glue Trigger API. + */ +export enum CrawlerState { + /** The crawler is currently running. */ + RUNNING = 'RUNNING', + + /** The crawler is in the process of being cancelled. */ + CANCELLING = 'CANCELLING', + + /** The crawler has been cancelled. */ + CANCELLED = 'CANCELLED', + + /** The crawler has completed its operation successfully. */ + SUCCEEDED = 'SUCCEEDED', + + /** The crawler has failed to complete its operation. */ + FAILED = 'FAILED', + + /** The crawler encountered an error during its operation. */ + ERROR = 'ERROR', +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/index.ts b/packages/@aws-cdk/aws-glue-alpha/lib/index.ts index 1b9514c14625e..ec13e6d7d0697 100644 --- a/packages/@aws-cdk/aws-glue-alpha/lib/index.ts +++ b/packages/@aws-cdk/aws-glue-alpha/lib/index.ts @@ -6,11 +6,22 @@ export * from './data-format'; export * from './data-quality-ruleset'; export * from './database'; export * from './external-table'; -export * from './job'; -export * from './job-executable'; export * from './s3-table'; export * from './schema'; export * from './security-configuration'; export * from './storage-parameter'; +export * from './constants'; +export * from './jobs/job'; +export * from './jobs/pyspark-etl-job'; +export * from './jobs/pyspark-flex-etl-job'; +export * from './jobs/pyspark-streaming-job'; +export * from './jobs/python-shell-job'; +export * from './jobs/ray-job'; +export * from './jobs/scala-spark-etl-job'; +export * from './jobs/scala-spark-flex-etl-job'; +export * from './jobs/scala-spark-streaming-job'; +export * from './jobs/spark-ui-utils'; export * from './table-base'; export * from './table-deprecated'; +export * from './triggers/workflow'; +export * from './triggers/trigger-options'; \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/job-executable.ts b/packages/@aws-cdk/aws-glue-alpha/lib/job-executable.ts deleted file mode 100644 index 4bee0a054bcd8..0000000000000 --- a/packages/@aws-cdk/aws-glue-alpha/lib/job-executable.ts +++ /dev/null @@ -1,527 +0,0 @@ -import { Code } from './code'; - -/** - * AWS Glue version determines the versions of Apache Spark and Python that are available to the job. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html. - * - * If you need to use a GlueVersion that doesn't exist as a static member, you - * can instantiate a `GlueVersion` object, e.g: `GlueVersion.of('1.5')`. - */ -export class GlueVersion { - /** - * Glue version using Spark 2.2.1 and Python 2.7 - */ - public static readonly V0_9 = new GlueVersion('0.9'); - - /** - * Glue version using Spark 2.4.3, Python 2.7 and Python 3.6 - */ - public static readonly V1_0 = new GlueVersion('1.0'); - - /** - * Glue version using Spark 2.4.3 and Python 3.7 - */ - public static readonly V2_0 = new GlueVersion('2.0'); - - /** - * Glue version using Spark 3.1.1 and Python 3.7 - */ - public static readonly V3_0 = new GlueVersion('3.0'); - - /** - * Glue version using Spark 3.3.0 and Python 3.10 - */ - public static readonly V4_0 = new GlueVersion('4.0'); - - /** - * Custom Glue version - * @param version custom version - */ - public static of(version: string): GlueVersion { - return new GlueVersion(version); - } - - /** - * The name of this GlueVersion, as expected by Job resource. - */ - public readonly name: string; - - private constructor(name: string) { - this.name = name; - } -} - -/** - * Runtime language of the Glue job - */ -export enum JobLanguage { - /** - * Scala - */ - SCALA = 'scala', - - /** - * Python - */ - PYTHON = 'python', -} - -/** - * Python version - */ -export enum PythonVersion { - /** - * Python 2 (the exact version depends on GlueVersion and JobCommand used) - */ - TWO = '2', - - /** - * Python 3 (the exact version depends on GlueVersion and JobCommand used) - */ - THREE = '3', - - /** - * Python 3.9 (the exact version depends on GlueVersion and JobCommand used) - */ - THREE_NINE = '3.9', -} - -/** - * AWS Glue runtime determines the runtime engine of the job. - * - */ -export class Runtime { - /** - * Runtime for a Glue for Ray 2.4. - */ - public static readonly RAY_TWO_FOUR = new Runtime('Ray2.4'); - - /** - * Custom runtime - * @param runtime custom runtime - */ - public static of(runtime: string): Runtime { - return new Runtime(runtime); - } - - /** - * The name of this Runtime. - */ - public readonly name: string; - - private constructor(name: string) { - this.name = name; - } -} - -/** - * The job type. - * - * If you need to use a JobType that doesn't exist as a static member, you - * can instantiate a `JobType` object, e.g: `JobType.of('other name')`. - */ -export class JobType { - /** - * Command for running a Glue Spark job. - */ - public static readonly ETL = new JobType('glueetl'); - - /** - * Command for running a Glue Spark streaming job. - */ - public static readonly STREAMING = new JobType('gluestreaming'); - - /** - * Command for running a Glue python shell job. - */ - public static readonly PYTHON_SHELL = new JobType('pythonshell'); - - /** - * Command for running a Glue Ray job. - */ - public static readonly RAY = new JobType('glueray'); - - /** - * Custom type name - * @param name type name - */ - public static of(name: string): JobType { - return new JobType(name); - } - - /** - * The name of this JobType, as expected by Job resource. - */ - public readonly name: string; - - private constructor(name: string) { - this.name = name; - } -} - -interface PythonExecutableProps { - /** - * The Python version to use. - */ - readonly pythonVersion: PythonVersion; - - /** - * Additional Python files that AWS Glue adds to the Python path before executing your script. - * Only individual files are supported, directories are not supported. - * Equivalent to a job parameter `--extra-py-files`. - * - * @default - no extra python files and argument is not set - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraPythonFiles?: Code[]; -} - -interface RayExecutableProps { - /** - * The Python version to use. - */ - readonly pythonVersion: PythonVersion; - - /** - * Additional Python modules that AWS Glue adds to the Python path before executing your script. - * Equivalent to a job parameter `--s3-py-modules`. - * - * @default - no extra python files and argument is not set - * - * @see https://docs.aws.amazon.com/glue/latest/dg/author-job-ray-job-parameters.html - */ - readonly s3PythonModules?: Code[]; -} - -interface SharedJobExecutableProps { - /** - * Runtime. It is required for Ray jobs. - * - */ - readonly runtime?: Runtime; - - /** - * Glue version. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/release-notes.html - */ - readonly glueVersion: GlueVersion; - - /** - * The script that executes a job. - */ - readonly script: Code; - - /** - * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. - * Only individual files are supported, directories are not supported. - * Equivalent to a job parameter `--extra-files`. - * - * @default [] - no extra files are copied to the working directory - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraFiles?: Code[]; -} - -interface SharedSparkJobExecutableProps extends SharedJobExecutableProps { - /** - * Additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. - * Only individual files are supported, directories are not supported. - * Equivalent to a job parameter `--extra-jars`. - * - * @default [] - no extra jars are added to the classpath - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraJars?: Code[]; - - /** - * Setting this value to true prioritizes the customer's extra JAR files in the classpath. - * Equivalent to a job parameter `--user-jars-first`. - * - * @default false - priority is not given to user-provided jars - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraJarsFirst?: boolean; -} - -/** - * Props for creating a Scala Spark (ETL or Streaming) job executable - */ -export interface ScalaJobExecutableProps extends SharedSparkJobExecutableProps { - /** - * The fully qualified Scala class name that serves as the entry point for the job. - * Equivalent to a job parameter `--class`. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly className: string; -} - -/** - * Props for creating a Python Spark (ETL or Streaming) job executable - */ -export interface PythonSparkJobExecutableProps extends SharedSparkJobExecutableProps, PythonExecutableProps {} - -/** - * Props for creating a Python shell job executable - */ -export interface PythonShellExecutableProps extends SharedJobExecutableProps, PythonExecutableProps {} - -/** - * Props for creating a Python Ray job executable - */ -export interface PythonRayExecutableProps extends SharedJobExecutableProps, RayExecutableProps {} - -/** - * The executable properties related to the Glue job's GlueVersion, JobType and code - */ -export class JobExecutable { - - /** - * Create Scala executable props for Apache Spark ETL job. - * - * @param props Scala Apache Spark Job props - */ - public static scalaEtl(props: ScalaJobExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.ETL, - language: JobLanguage.SCALA, - }); - } - - /** - * Create Scala executable props for Apache Spark Streaming job. - * - * @param props Scala Apache Spark Job props - */ - public static scalaStreaming(props: ScalaJobExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.STREAMING, - language: JobLanguage.SCALA, - }); - } - - /** - * Create Python executable props for Apache Spark ETL job. - * - * @param props Python Apache Spark Job props - */ - public static pythonEtl(props: PythonSparkJobExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.ETL, - language: JobLanguage.PYTHON, - }); - } - - /** - * Create Python executable props for Apache Spark Streaming job. - * - * @param props Python Apache Spark Job props - */ - public static pythonStreaming(props: PythonSparkJobExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.STREAMING, - language: JobLanguage.PYTHON, - }); - } - - /** - * Create Python executable props for python shell jobs. - * - * @param props Python Shell Job props. - */ - public static pythonShell(props: PythonShellExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.PYTHON_SHELL, - language: JobLanguage.PYTHON, - }); - } - - /** - * Create Python executable props for Ray jobs. - * - * @param props Ray Job props. - */ - public static pythonRay(props: PythonRayExecutableProps): JobExecutable { - return new JobExecutable({ - ...props, - type: JobType.RAY, - language: JobLanguage.PYTHON, - }); - } - - /** - * Create a custom JobExecutable. - * - * @param config custom job executable configuration. - */ - public static of(config: JobExecutableConfig): JobExecutable { - return new JobExecutable(config); - } - - private config: JobExecutableConfig; - - private constructor(config: JobExecutableConfig) { - const glueVersion = config.glueVersion.name; - const type = config.type.name; - if (JobType.PYTHON_SHELL.name === type) { - if (config.language !== JobLanguage.PYTHON) { - throw new Error('Python shell requires the language to be set to Python'); - } - if ([GlueVersion.V0_9.name, GlueVersion.V4_0.name].includes(glueVersion)) { - throw new Error(`Specified GlueVersion ${glueVersion} does not support Python Shell`); - } - } - if (JobType.RAY.name === type) { - if (config.language !== JobLanguage.PYTHON) { - throw new Error('Ray requires the language to be set to Python'); - } - if ([GlueVersion.V0_9.name, GlueVersion.V1_0.name, GlueVersion.V2_0.name, GlueVersion.V3_0.name].includes(glueVersion)) { - throw new Error(`Specified GlueVersion ${glueVersion} does not support Ray`); - } - } - if (config.extraJarsFirst && [GlueVersion.V0_9.name, GlueVersion.V1_0.name].includes(glueVersion)) { - throw new Error(`Specified GlueVersion ${glueVersion} does not support extraJarsFirst`); - } - if (config.pythonVersion === PythonVersion.TWO && ![GlueVersion.V0_9.name, GlueVersion.V1_0.name].includes(glueVersion)) { - throw new Error(`Specified GlueVersion ${glueVersion} does not support PythonVersion ${config.pythonVersion}`); - } - if (JobLanguage.PYTHON !== config.language && config.extraPythonFiles) { - throw new Error('extraPythonFiles is not supported for languages other than JobLanguage.PYTHON'); - } - if (config.extraPythonFiles && type === JobType.RAY.name) { - throw new Error('extraPythonFiles is not supported for Ray jobs'); - } - if (config.pythonVersion === PythonVersion.THREE_NINE && type !== JobType.PYTHON_SHELL.name && type !== JobType.RAY.name) { - throw new Error('Specified PythonVersion PythonVersion.THREE_NINE is only supported for JobType Python Shell and Ray'); - } - if (config.pythonVersion === PythonVersion.THREE && type === JobType.RAY.name) { - throw new Error('Specified PythonVersion PythonVersion.THREE is not supported for Ray'); - } - if (config.runtime === undefined && type === JobType.RAY.name) { - throw new Error('Runtime is required for Ray jobs'); - } - this.config = config; - } - - /** - * Called during Job initialization to get JobExecutableConfig. - */ - public bind(): JobExecutableConfig { - return this.config; - } -} - -/** - * Result of binding a `JobExecutable` into a `Job`. - */ -export interface JobExecutableConfig { - /** - * Glue version. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/release-notes.html - */ - readonly glueVersion: GlueVersion; - - /** - * The language of the job (Scala or Python). - * Equivalent to a job parameter `--job-language`. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly language: JobLanguage; - - /** - * Specify the type of the job whether it's an Apache Spark ETL or streaming one or if it's a Python shell job. - */ - readonly type: JobType; - - /** - * The Python version to use. - * - * @default - no python version specified - */ - readonly pythonVersion?: PythonVersion; - - /** - * The Runtime to use. - * - * @default - no runtime specified - */ - readonly runtime?: Runtime; - - /** - * The script that is executed by a job. - */ - readonly script: Code; - - /** - * The Scala class that serves as the entry point for the job. This applies only if your the job langauage is Scala. - * Equivalent to a job parameter `--class`. - * - * @default - no scala className specified - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly className?: string; - - /** - * Additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. - * Equivalent to a job parameter `--extra-jars`. - * - * @default - no extra jars specified. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraJars?: Code[]; - - /** - * Additional Python files that AWS Glue adds to the Python path before executing your script. - * Equivalent to a job parameter `--extra-py-files`. - * - * @default - no extra python files specified. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraPythonFiles?: Code[]; - - /** - * Additional Python modules that AWS Glue adds to the Python path before executing your script. - * Equivalent to a job parameter `--s3-py-modules`. - * - * @default - no extra python files specified. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/author-job-ray-job-parameters.html - */ - readonly s3PythonModules?: Code[]; - - /** - * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. - * Equivalent to a job parameter `--extra-files`. - * - * @default - no extra files specified. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraFiles?: Code[]; - - /** - * Setting this value to true prioritizes the customer's extra JAR files in the classpath. - * Equivalent to a job parameter `--user-jars-first`. - * - * @default - extra jars are not prioritized. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly extraJarsFirst?: boolean; -} diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/job.ts deleted file mode 100644 index 813894f0b6898..0000000000000 --- a/packages/@aws-cdk/aws-glue-alpha/lib/job.ts +++ /dev/null @@ -1,921 +0,0 @@ -import { EOL } from 'os'; -import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'; -import * as events from 'aws-cdk-lib/aws-events'; -import * as iam from 'aws-cdk-lib/aws-iam'; -import * as logs from 'aws-cdk-lib/aws-logs'; -import * as s3 from 'aws-cdk-lib/aws-s3'; -import * as cdk from 'aws-cdk-lib/core'; -import * as constructs from 'constructs'; -import { Code, GlueVersion, JobExecutable, JobExecutableConfig, JobType } from '.'; -import { IConnection } from './connection'; -import { CfnJob } from 'aws-cdk-lib/aws-glue'; -import { ISecurityConfiguration } from './security-configuration'; - -/** - * The type of predefined worker that is allocated when a job runs. - * - * If you need to use a WorkerType that doesn't exist as a static member, you - * can instantiate a `WorkerType` object, e.g: `WorkerType.of('other type')`. - */ -export class WorkerType { - /** - * Each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker. - */ - public static readonly STANDARD = new WorkerType('Standard'); - - /** - * Each worker maps to 1 DPU (4 vCPU, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs. - */ - public static readonly G_1X = new WorkerType('G.1X'); - - /** - * Each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. Suitable for memory-intensive jobs. - */ - public static readonly G_2X = new WorkerType('G.2X'); - - /** - * Each worker maps to 4 DPU (16 vCPU, 64 GB of memory, 256 GB disk), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later jobs. - */ - public static readonly G_4X = new WorkerType('G.4X'); - - /** - * Each worker maps to 8 DPU (32 vCPU, 128 GB of memory, 512 GB disk), and provides 1 executor per worker. We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later jobs. - */ - public static readonly G_8X = new WorkerType('G.8X'); - - /** - * Each worker maps to 0.25 DPU (2 vCPU, 4 GB of memory, 64 GB disk), and provides 1 executor per worker. Suitable for low volume streaming jobs. - */ - public static readonly G_025X = new WorkerType('G.025X'); - - /** - * Each worker maps to 2 high-memory DPU [M-DPU] (8 vCPU, 64 GB of memory, 128 GB disk). Supported in Ray jobs. - */ - public static readonly Z_2X = new WorkerType('Z.2X'); - - /** - * Custom worker type - * @param workerType custom worker type - */ - public static of(workerType: string): WorkerType { - return new WorkerType(workerType); - } - - /** - * The name of this WorkerType, as expected by Job resource. - */ - public readonly name: string; - - private constructor(name: string) { - this.name = name; - } -} - -/** - * Job states emitted by Glue to CloudWatch Events. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types for more information. - */ -export enum JobState { - /** - * State indicating job run succeeded - */ - SUCCEEDED = 'SUCCEEDED', - - /** - * State indicating job run failed - */ - FAILED = 'FAILED', - - /** - * State indicating job run timed out - */ - TIMEOUT = 'TIMEOUT', - - /** - * State indicating job is starting - */ - STARTING = 'STARTING', - - /** - * State indicating job is running - */ - RUNNING = 'RUNNING', - - /** - * State indicating job is stopping - */ - STOPPING = 'STOPPING', - - /** - * State indicating job stopped - */ - STOPPED = 'STOPPED', -} - -/** - * The Glue CloudWatch metric type. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html - */ -export enum MetricType { - /** - * A value at a point in time. - */ - GAUGE = 'gauge', - - /** - * An aggregate number. - */ - COUNT = 'count', -} - -/** - * The ExecutionClass whether the job is run with a standard or flexible execution class. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job - * @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html - */ -export enum ExecutionClass { - /** - * The flexible execution class is appropriate for time-insensitive jobs whose start - * and completion times may vary. - */ - FLEX = 'FLEX', - - /** - * The standard execution class is ideal for time-sensitive workloads that require fast job - * startup and dedicated resources. - */ - STANDARD = 'STANDARD', -} - -/** - * Interface representing a created or an imported `Job`. - */ -export interface IJob extends cdk.IResource, iam.IGrantable { - /** - * The name of the job. - * @attribute - */ - readonly jobName: string; - - /** - * The ARN of the job. - * @attribute - */ - readonly jobArn: string; - - /** - * Defines a CloudWatch event rule triggered when something happens with this job. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - onEvent(id: string, options?: events.OnEventOptions): events.Rule; - - /** - * Defines a CloudWatch event rule triggered when this job moves to the input jobState. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - onStateChange(id: string, jobState: JobState, options?: events.OnEventOptions): events.Rule; - - /** - * Defines a CloudWatch event rule triggered when this job moves to the SUCCEEDED state. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - onSuccess(id: string, options?: events.OnEventOptions): events.Rule; - - /** - * Defines a CloudWatch event rule triggered when this job moves to the FAILED state. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - onFailure(id: string, options?: events.OnEventOptions): events.Rule; - - /** - * Defines a CloudWatch event rule triggered when this job moves to the TIMEOUT state. - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - onTimeout(id: string, options?: events.OnEventOptions): events.Rule; - - /** - * Create a CloudWatch metric. - * - * @param metricName name of the metric typically prefixed with `glue.driver.`, `glue..` or `glue.ALL.`. - * @param type the metric type. - * @param props metric options. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html - */ - metric(metricName: string, type: MetricType, props?: cloudwatch.MetricOptions): cloudwatch.Metric; - - /** - * Create a CloudWatch Metric indicating job success. - */ - metricSuccess(props?: cloudwatch.MetricOptions): cloudwatch.Metric; - - /** - * Create a CloudWatch Metric indicating job failure. - */ - metricFailure(props?: cloudwatch.MetricOptions): cloudwatch.Metric; - - /** - * Create a CloudWatch Metric indicating job timeout. - */ - metricTimeout(props?: cloudwatch.MetricOptions): cloudwatch.Metric; -} - -abstract class JobBase extends cdk.Resource implements IJob { - - public abstract readonly jobArn: string; - public abstract readonly jobName: string; - public abstract readonly grantPrincipal: iam.IPrincipal; - - /** - * Create a CloudWatch Event Rule for this Glue Job when it's in a given state - * - * @param id construct id - * @param options event options. Note that some values are overridden if provided, these are - * - eventPattern.source = ['aws.glue'] - * - eventPattern.detailType = ['Glue Job State Change', 'Glue Job Run Status'] - * - eventPattern.detail.jobName = [this.jobName] - * - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - */ - public onEvent(id: string, options: events.OnEventOptions = {}): events.Rule { - const rule = new events.Rule(this, id, options); - rule.addTarget(options.target); - rule.addEventPattern({ - source: ['aws.glue'], - detailType: ['Glue Job State Change', 'Glue Job Run Status'], - detail: { - jobName: [this.jobName], - }, - }); - return rule; - } - - /** - * Create a CloudWatch Event Rule for the transition into the input jobState. - * - * @param id construct id. - * @param jobState the job state. - * @param options optional event options. - */ - public onStateChange(id: string, jobState: JobState, options: events.OnEventOptions = {}): events.Rule { - const rule = this.onEvent(id, { - description: `Rule triggered when Glue job ${this.jobName} is in ${jobState} state`, - ...options, - }); - rule.addEventPattern({ - detail: { - state: [jobState], - }, - }); - return rule; - } - - /** - * Create a CloudWatch Event Rule matching JobState.SUCCEEDED. - * - * @param id construct id. - * @param options optional event options. default is {}. - */ - public onSuccess(id: string, options: events.OnEventOptions = {}): events.Rule { - return this.onStateChange(id, JobState.SUCCEEDED, options); - } - - /** - * Return a CloudWatch Event Rule matching FAILED state. - * - * @param id construct id. - * @param options optional event options. default is {}. - */ - public onFailure(id: string, options: events.OnEventOptions = {}): events.Rule { - return this.onStateChange(id, JobState.FAILED, options); - } - - /** - * Return a CloudWatch Event Rule matching TIMEOUT state. - * - * @param id construct id. - * @param options optional event options. default is {}. - */ - public onTimeout(id: string, options: events.OnEventOptions = {}): events.Rule { - return this.onStateChange(id, JobState.TIMEOUT, options); - } - - /** - * Create a CloudWatch metric. - * - * @param metricName name of the metric typically prefixed with `glue.driver.`, `glue..` or `glue.ALL.`. - * @param type the metric type. - * @param props metric options. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html - */ - public metric(metricName: string, type: MetricType, props?: cloudwatch.MetricOptions): cloudwatch.Metric { - return new cloudwatch.Metric({ - metricName, - namespace: 'Glue', - dimensionsMap: { - JobName: this.jobName, - JobRunId: 'ALL', - Type: type, - }, - ...props, - }).attachTo(this); - } - - /** - * Return a CloudWatch Metric indicating job success. - * - * This metric is based on the Rule returned by no-args onSuccess() call. - */ - public metricSuccess(props?: cloudwatch.MetricOptions): cloudwatch.Metric { - return metricRule(this.metricJobStateRule('SuccessMetricRule', JobState.SUCCEEDED), props); - } - - /** - * Return a CloudWatch Metric indicating job failure. - * - * This metric is based on the Rule returned by no-args onFailure() call. - */ - public metricFailure(props?: cloudwatch.MetricOptions): cloudwatch.Metric { - return metricRule(this.metricJobStateRule('FailureMetricRule', JobState.FAILED), props); - } - - /** - * Return a CloudWatch Metric indicating job timeout. - * - * This metric is based on the Rule returned by no-args onTimeout() call. - */ - public metricTimeout(props?: cloudwatch.MetricOptions): cloudwatch.Metric { - return metricRule(this.metricJobStateRule('TimeoutMetricRule', JobState.TIMEOUT), props); - } - - /** - * Creates or retrieves a singleton event rule for the input job state for use with the metric JobState methods. - * - * @param id construct id. - * @param jobState the job state. - * @private - */ - private metricJobStateRule(id: string, jobState: JobState): events.Rule { - return this.node.tryFindChild(id) as events.Rule ?? this.onStateChange(id, jobState); - } -} - -/** - * Properties for enabling Spark UI monitoring feature for Spark-based Glue jobs. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ -export interface SparkUIProps { - /** - * Enable Spark UI. - */ - readonly enabled: boolean; - - /** - * The bucket where the Glue job stores the logs. - * - * @default - a new bucket will be created. - */ - readonly bucket?: s3.IBucket; - - /** - * The path inside the bucket (objects prefix) where the Glue job stores the logs. - * Use format `'foo/bar/'` - * - * @default - the logs will be written at the root of the bucket - */ - readonly prefix?: string; -} - -/** - * The Spark UI logging location. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ -export interface SparkUILoggingLocation { - /** - * The bucket where the Glue job stores the logs. - * - * @default - a new bucket will be created. - */ - readonly bucket: s3.IBucket; - - /** - * The path inside the bucket (objects prefix) where the Glue job stores the logs. - * - * @default - the logs will be written at the root of the bucket - */ - readonly prefix?: string; -} - -/** - * Properties for enabling Continuous Logging for Glue Jobs. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging-enable.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ -export interface ContinuousLoggingProps { - /** - * Enable continouous logging. - */ - readonly enabled: boolean; - - /** - * Specify a custom CloudWatch log group name. - * - * @default - a log group is created with name `/aws-glue/jobs/logs-v2/`. - */ - readonly logGroup?: logs.ILogGroup; - - /** - * Specify a custom CloudWatch log stream prefix. - * - * @default - the job run ID. - */ - readonly logStreamPrefix?: string; - - /** - * Filter out non-useful Apache Spark driver/executor and Apache Hadoop YARN heartbeat log messages. - * - * @default true - */ - readonly quiet?: boolean; - - /** - * Apply the provided conversion pattern. - * - * This is a Log4j Conversion Pattern to customize driver and executor logs. - * - * @default `%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n` - */ - readonly conversionPattern?: string; -} - -/** - * Attributes for importing `Job`. - */ -export interface JobAttributes { - /** - * The name of the job. - */ - readonly jobName: string; - - /** - * The IAM role assumed by Glue to run this job. - * - * @default - undefined - */ - readonly role?: iam.IRole; -} - -/** - * Construction properties for `Job`. - */ -export interface JobProps { - /** - * The job's executable properties. - */ - readonly executable: JobExecutable; - - /** - * The name of the job. - * - * @default - a name is automatically generated - */ - readonly jobName?: string; - - /** - * The description of the job. - * - * @default - no value - */ - readonly description?: string; - - /** - * The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. - * Cannot be used for Glue version 2.0 and later - workerType and workerCount should be used instead. - * - * @default - 10 when job type is Apache Spark ETL or streaming, 0.0625 when job type is Python shell - */ - readonly maxCapacity?: number; - - /** - * The maximum number of times to retry this job after a job run fails. - * - * @default 0 - */ - readonly maxRetries?: number; - - /** - * The maximum number of concurrent runs allowed for the job. - * - * An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. - * - * @default 1 - */ - readonly maxConcurrentRuns?: number; - - /** - * The number of minutes to wait after a job run starts, before sending a job run delay notification. - * - * @default - no delay notifications - */ - readonly notifyDelayAfter?: cdk.Duration; - - /** - * The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. - * - * @default cdk.Duration.hours(48) - */ - readonly timeout?: cdk.Duration; - - /** - * The type of predefined worker that is allocated when a job runs. - * - * @default - differs based on specific Glue version - */ - readonly workerType?: WorkerType; - - /** - * The number of workers of a defined `WorkerType` that are allocated when a job runs. - * - * @default - differs based on specific Glue version/worker type - */ - readonly workerCount?: number; - - /** - * The `Connection`s used for this job. - * - * Connections are used to connect to other AWS Service or resources within a VPC. - * - * @default [] - no connections are added to the job - */ - readonly connections?: IConnection[]; - - /** - * The `SecurityConfiguration` to use for this job. - * - * @default - no security configuration. - */ - readonly securityConfiguration?: ISecurityConfiguration; - - /** - * The default arguments for this job, specified as name-value pairs. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters - * @default - no arguments - */ - readonly defaultArguments?: { [key: string]: string }; - - /** - * The tags to add to the resources on which the job runs - * - * @default {} - no tags - */ - readonly tags?: { [key: string]: string }; - - /** - * The IAM role assumed by Glue to run this job. - * - * If providing a custom role, it needs to trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html - * - * @default - a role is automatically generated - */ - readonly role?: iam.IRole; - - /** - * Enables the collection of metrics for job profiling. - * Equivalent to a job parameter `--enable-metrics`. - * - * @default - no profiling metrics emitted. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly enableProfilingMetrics? :boolean; - - /** - * Enables the Spark UI debugging and monitoring with the specified props. - * - * @default - Spark UI debugging and monitoring is disabled. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly sparkUI?: SparkUIProps; - - /** - * Enables continuous logging with the specified props. - * - * @default - continuous logging is disabled. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging-enable.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - readonly continuousLogging?: ContinuousLoggingProps; - - /** - * The ExecutionClass whether the job is run with a standard or flexible execution class. - * - * @default - STANDARD - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-Job - * @see https://docs.aws.amazon.com/glue/latest/dg/add-job.html - */ - readonly executionClass?: ExecutionClass; -} - -/** - * A Glue Job. - */ -export class Job extends JobBase { - /** - * Creates a Glue Job - * - * @param scope The scope creating construct (usually `this`). - * @param id The construct's id. - * @param attrs Import attributes - */ - public static fromJobAttributes(scope: constructs.Construct, id: string, attrs: JobAttributes): IJob { - class Import extends JobBase { - public readonly jobName = attrs.jobName; - public readonly jobArn = jobArn(scope, attrs.jobName); - public readonly grantPrincipal = attrs.role ?? new iam.UnknownPrincipal({ resource: this }); - } - - return new Import(scope, id); - } - - /** - * The ARN of the job. - */ - public readonly jobArn: string; - - /** - * The name of the job. - */ - public readonly jobName: string; - - /** - * The IAM role Glue assumes to run this job. - */ - public readonly role: iam.IRole; - - /** - * The principal this Glue Job is running as. - */ - public readonly grantPrincipal: iam.IPrincipal; - - /** - * The Spark UI logs location if Spark UI monitoring and debugging is enabled. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - public readonly sparkUILoggingLocation?: SparkUILoggingLocation; - - constructor(scope: constructs.Construct, id: string, props: JobProps) { - super(scope, id, { - physicalName: props.jobName, - }); - - const executable = props.executable.bind(); - - this.role = props.role ?? new iam.Role(this, 'ServiceRole', { - assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), - managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], - }); - this.grantPrincipal = this.role; - - const sparkUI = props.sparkUI?.enabled ? this.setupSparkUI(executable, this.role, props.sparkUI) : undefined; - this.sparkUILoggingLocation = sparkUI?.location; - const continuousLoggingArgs = props.continuousLogging?.enabled ? this.setupContinuousLogging(this.role, props.continuousLogging) : {}; - const profilingMetricsArgs = props.enableProfilingMetrics ? { '--enable-metrics': '' } : {}; - - const defaultArguments = { - ...this.executableArguments(executable), - ...continuousLoggingArgs, - ...profilingMetricsArgs, - ...sparkUI?.args, - ...this.checkNoReservedArgs(props.defaultArguments), - }; - - if (props.executionClass === ExecutionClass.FLEX) { - if (executable.type !== JobType.ETL) { - throw new Error('FLEX ExecutionClass is only available for JobType.ETL jobs'); - } - if ([GlueVersion.V0_9, GlueVersion.V1_0, GlueVersion.V2_0].includes(executable.glueVersion)) { - throw new Error('FLEX ExecutionClass is only available for GlueVersion 3.0 or later'); - } - if (props.workerType && (props.workerType !== WorkerType.G_1X && props.workerType !== WorkerType.G_2X)) { - throw new Error('FLEX ExecutionClass is only available for WorkerType G_1X or G_2X'); - } - } - - let maxCapacity = props.maxCapacity; - if (maxCapacity !== undefined && (props.workerType && props.workerCount !== undefined)) { - throw new Error('maxCapacity cannot be used when setting workerType and workerCount'); - } - if (executable.type !== JobType.PYTHON_SHELL) { - if (maxCapacity !== undefined && ![GlueVersion.V0_9, GlueVersion.V1_0].includes(executable.glueVersion)) { - throw new Error('maxCapacity cannot be used when GlueVersion 2.0 or later'); - } - } else { - // max capacity validation for python shell jobs (defaults to 0.0625) - maxCapacity = maxCapacity ?? 0.0625; - if (maxCapacity !== 0.0625 && maxCapacity !== 1) { - throw new Error(`maxCapacity value must be either 0.0625 or 1 for JobType.PYTHON_SHELL jobs, received ${maxCapacity}`); - } - } - if ((!props.workerType && props.workerCount !== undefined) || (props.workerType && props.workerCount === undefined)) { - throw new Error('Both workerType and workerCount must be set'); - } - - const jobResource = new CfnJob(this, 'Resource', { - name: props.jobName, - description: props.description, - role: this.role.roleArn, - command: { - name: executable.type.name, - scriptLocation: this.codeS3ObjectUrl(executable.script), - pythonVersion: executable.pythonVersion, - runtime: executable.runtime ? executable.runtime.name : undefined, - }, - glueVersion: executable.glueVersion.name, - workerType: props.workerType?.name, - numberOfWorkers: props.workerCount, - maxCapacity: props.maxCapacity, - maxRetries: props.maxRetries, - executionClass: props.executionClass, - executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, - notificationProperty: props.notifyDelayAfter ? { notifyDelayAfter: props.notifyDelayAfter.toMinutes() } : undefined, - timeout: props.timeout?.toMinutes(), - connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, - securityConfiguration: props.securityConfiguration?.securityConfigurationName, - tags: props.tags, - defaultArguments, - }); - - const resourceName = this.getResourceNameAttribute(jobResource.ref); - this.jobArn = jobArn(this, resourceName); - this.jobName = resourceName; - } - - /** - * Check no usage of reserved arguments. - * - * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html - */ - private checkNoReservedArgs(defaultArguments?: { [key: string]: string }) { - if (defaultArguments) { - const reservedArgs = new Set(['--debug', '--mode', '--JOB_NAME']); - Object.keys(defaultArguments).forEach((arg) => { - if (reservedArgs.has(arg)) { - throw new Error(`The ${arg} argument is reserved by Glue. Don't set it`); - } - }); - } - return defaultArguments; - } - - private executableArguments(config: JobExecutableConfig) { - const args: { [key: string]: string } = {}; - args['--job-language'] = config.language; - if (config.className) { - args['--class'] = config.className; - } - if (config.extraJars && config.extraJars?.length > 0) { - args['--extra-jars'] = config.extraJars.map(code => this.codeS3ObjectUrl(code)).join(','); - } - if (config.extraPythonFiles && config.extraPythonFiles.length > 0) { - args['--extra-py-files'] = config.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); - } - if (config.s3PythonModules && config.s3PythonModules.length > 0) { - args['--s3-py-modules'] = config.s3PythonModules.map(code => this.codeS3ObjectUrl(code)).join(','); - } - if (config.extraFiles && config.extraFiles.length > 0) { - args['--extra-files'] = config.extraFiles.map(code => this.codeS3ObjectUrl(code)).join(','); - } - if (config.extraJarsFirst) { - args['--user-jars-first'] = 'true'; - } - return args; - } - - private setupSparkUI(executable: JobExecutableConfig, role: iam.IRole, props: SparkUIProps) { - if (JobType.PYTHON_SHELL === executable.type) { - throw new Error('Spark UI is not available for JobType.PYTHON_SHELL jobs'); - } else if (JobType.RAY === executable.type) { - throw new Error('Spark UI is not available for JobType.RAY jobs'); - } - - this.validatePrefix(props.prefix); - const bucket = props.bucket ?? new s3.Bucket(this, 'SparkUIBucket'); - bucket.grantReadWrite(role, this.cleanPrefixForGrant(props.prefix)); - const args = { - '--enable-spark-ui': 'true', - '--spark-event-logs-path': bucket.s3UrlForObject(props.prefix).replace(/\/?$/, '/'), // path will always end with a slash - }; - - return { - location: { - prefix: props.prefix, - bucket, - }, - args, - }; - } - - private validatePrefix(prefix?: string): void { - if (!prefix || cdk.Token.isUnresolved(prefix)) { - // skip validation if prefix is not specified or is a token - return; - } - - const errors: string[] = []; - - if (prefix.startsWith('/')) { - errors.push('Prefix must not begin with \'/\''); - } - - if (!prefix.endsWith('/')) { - errors.push('Prefix must end with \'/\''); - } - - if (errors.length > 0) { - throw new Error(`Invalid prefix format (value: ${prefix})${EOL}${errors.join(EOL)}`); - } - } - - private cleanPrefixForGrant(prefix?: string): string | undefined { - return prefix !== undefined ? `${prefix}*` : undefined; - } - - private setupContinuousLogging(role: iam.IRole, props: ContinuousLoggingProps) { - const args: {[key: string]: string} = { - '--enable-continuous-cloudwatch-log': 'true', - '--enable-continuous-log-filter': (props.quiet ?? true).toString(), - }; - - if (props.logGroup) { - args['--continuous-log-logGroup'] = props.logGroup.logGroupName; - props.logGroup.grantWrite(role); - } - - if (props.logStreamPrefix) { - args['--continuous-log-logStreamPrefix'] = props.logStreamPrefix; - } - if (props.conversionPattern) { - args['--continuous-log-conversionPattern'] = props.conversionPattern; - } - return args; - } - - private codeS3ObjectUrl(code: Code) { - const s3Location = code.bind(this, this.role).s3Location; - return `s3://${s3Location.bucketName}/${s3Location.objectKey}`; - } -} - -/** - * Create a CloudWatch Metric that's based on Glue Job events. - * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types - * The metric has namespace = 'AWS/Events', metricName = 'TriggeredRules' and RuleName = rule.ruleName dimension. - * - * @param rule for use in setting RuleName dimension value - * @param props metric properties - */ -function metricRule(rule: events.IRule, props?: cloudwatch.MetricOptions): cloudwatch.Metric { - return new cloudwatch.Metric({ - namespace: 'AWS/Events', - metricName: 'TriggeredRules', - dimensionsMap: { RuleName: rule.ruleName }, - statistic: cloudwatch.Statistic.SUM, - ...props, - }).attachTo(rule); -} - -/** - * Returns the job arn - * @param scope - * @param jobName - */ -function jobArn(scope: constructs.Construct, jobName: string) : string { - return cdk.Stack.of(scope).formatArn({ - service: 'glue', - resource: 'job', - resourceName: jobName, - }); -} diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/job.ts new file mode 100644 index 0000000000000..57bb1cc636442 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/job.ts @@ -0,0 +1,559 @@ +import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'; +import * as events from 'aws-cdk-lib/aws-events'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as logs from 'aws-cdk-lib/aws-logs'; +import * as cdk from 'aws-cdk-lib/core'; +import * as constructs from 'constructs'; +import { Code } from '..'; +import { MetricType, JobState, WorkerType, GlueVersion } from '../constants'; +import { IConnection } from '../connection'; +import { ISecurityConfiguration } from '../security-configuration'; + +/** + * Interface representing a new or an imported Glue Job + */ +export interface IJob extends cdk.IResource, iam.IGrantable { + /** + * The name of the job. + * @attribute + */ + readonly jobName: string; + + /** + * The ARN of the job. + * @attribute + */ + readonly jobArn: string; + + /** + * Defines a CloudWatch event rule triggered when something happens with this job. + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types + */ + onEvent(id: string, options?: events.OnEventOptions): events.Rule; + + /** + * Defines a CloudWatch event rule triggered when this job moves to the SUCCEEDED state. + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types + */ + onSuccess(id: string, options?: events.OnEventOptions): events.Rule; + + /** + * Defines a CloudWatch event rule triggered when this job moves to the FAILED state. + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types + */ + onFailure(id: string, options?: events.OnEventOptions): events.Rule; + + /** + * Defines a CloudWatch event rule triggered when this job moves to the TIMEOUT state. + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types + */ + onTimeout(id: string, options?: events.OnEventOptions): events.Rule; + + /** + * Create a CloudWatch metric. + * + * @param metricName name of the metric typically prefixed with `glue.driver.`, `glue..` or `glue.ALL.`. + * @param type the metric type. + * @param props metric options. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html + */ + metric(metricName: string, type: MetricType, props?: cloudwatch.MetricOptions): cloudwatch.Metric; + + /** + * Create a CloudWatch Metric indicating job success. + */ + metricSuccess(props?: cloudwatch.MetricOptions): cloudwatch.Metric; + + /** + * Create a CloudWatch Metric indicating job failure. + */ + metricFailure(props?: cloudwatch.MetricOptions): cloudwatch.Metric; + + /** + * Create a CloudWatch Metric indicating job timeout. + */ + metricTimeout(props?: cloudwatch.MetricOptions): cloudwatch.Metric; +} + +/** + * Properties for enabling Continuous Logging for Glue Jobs. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging-enable.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ +export interface ContinuousLoggingProps { + /** + * Enable continouous logging. + */ + readonly enabled: boolean; + + /** + * Specify a custom CloudWatch log group name. + * + * @default - a log group is created with name `/aws-glue/jobs/logs-v2/`. + */ + readonly logGroup?: logs.ILogGroup; + + /** + * Specify a custom CloudWatch log stream prefix. + * + * @default - the job run ID. + */ + readonly logStreamPrefix?: string; + + /** + * Filter out non-useful Apache Spark driver/executor and Apache Hadoop YARN heartbeat log messages. + * + * @default true + */ + readonly quiet?: boolean; + + /** + * Apply the provided conversion pattern. + * + * This is a Log4j Conversion Pattern to customize driver and executor logs. + * + * @default `%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n` + */ + readonly conversionPattern?: string; +} + +/** + * A base class is needed to be able to import existing Jobs into a CDK app to + * reference as part of a larger stack or construct. JobBase has the subset + * of attribtues required to idenitfy and reference an existing Glue Job, + * as well as some CloudWatch metric conveneince functions to configure an + * event-driven flow using the job. + */ +export abstract class JobBase extends cdk.Resource implements IJob { + + public abstract readonly jobArn: string; + public abstract readonly jobName: string; + public abstract readonly grantPrincipal: iam.IPrincipal; + + /** + * Create a CloudWatch Event Rule for this Glue Job when it's in a given state + * + * @param id construct id + * @param options event options. Note that some values are overridden if provided, these are + * - eventPattern.source = ['aws.glue'] + * - eventPattern.detailType = ['Glue Job State Change', 'Glue Job Run Status'] + * - eventPattern.detail.jobName = [this.jobName] + * + * @see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types + */ + public onEvent(id: string, options: events.OnEventOptions = {}): events.Rule { + const rule = new events.Rule(this, id, options); + rule.addTarget(options.target); + rule.addEventPattern({ + source: ['aws.glue'], + detailType: ['Glue Job State Change', 'Glue Job Run Status'], + detail: { + jobName: [this.jobName], + }, + }); + return rule; + } + + /** + * Create a CloudWatch Event Rule for the transition into the input jobState. + * + * @param id construct id. + * @param jobState the job state. + * @param options optional event options. + */ + protected onStateChange(id: string, jobState: JobState, options: events.OnEventOptions = {}): events.Rule { + const rule = this.onEvent(id, { + description: `Rule triggered when Glue job ${this.jobName} is in ${jobState} state`, + ...options, + }); + rule.addEventPattern({ + detail: { + state: [jobState], + }, + }); + return rule; + } + + /** + * Create a CloudWatch Event Rule matching JobState.SUCCEEDED. + * + * @param id construct id. + * @param options optional event options. default is {}. + */ + public onSuccess(id: string, options: events.OnEventOptions = {}): events.Rule { + return this.onStateChange(id, JobState.SUCCEEDED, options); + } + + /** + * Return a CloudWatch Event Rule matching FAILED state. + * + * @param id construct id. + * @param options optional event options. default is {}. + */ + public onFailure(id: string, options: events.OnEventOptions = {}): events.Rule { + return this.onStateChange(id, JobState.FAILED, options); + } + + /** + * Return a CloudWatch Event Rule matching TIMEOUT state. + * + * @param id construct id. + * @param options optional event options. default is {}. + */ + public onTimeout(id: string, options: events.OnEventOptions = {}): events.Rule { + return this.onStateChange(id, JobState.TIMEOUT, options); + } + + /** + * Create a CloudWatch metric. + * + * @param metricName name of the metric typically prefixed with `glue.driver.`, `glue..` or `glue.ALL.`. + * @param type the metric type. + * @param props metric options. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html + */ + public metric(metricName: string, type: MetricType, props?: cloudwatch.MetricOptions): cloudwatch.Metric { + return new cloudwatch.Metric({ + metricName, + namespace: 'Glue', + dimensionsMap: { + JobName: this.jobName, + JobRunId: 'ALL', + Type: type, + }, + ...props, + }).attachTo(this); + } + + /** + * Return a CloudWatch Metric indicating job success. + * + * This metric is based on the Rule returned by no-args onSuccess() call. + */ + public metricSuccess(props?: cloudwatch.MetricOptions): cloudwatch.Metric { + return metricRule(this.metricJobStateRule('SuccessMetricRule', JobState.SUCCEEDED), props); + } + + /** + * Return a CloudWatch Metric indicating job failure. + * + * This metric is based on the Rule returned by no-args onFailure() call. + */ + public metricFailure(props?: cloudwatch.MetricOptions): cloudwatch.Metric { + return metricRule(this.metricJobStateRule('FailureMetricRule', JobState.FAILED), props); + } + + /** + * Return a CloudWatch Metric indicating job timeout. + * + * This metric is based on the Rule returned by no-args onTimeout() call. + */ + public metricTimeout(props?: cloudwatch.MetricOptions): cloudwatch.Metric { + return metricRule(this.metricJobStateRule('TimeoutMetricRule', JobState.TIMEOUT), props); + } + + /** + * Creates or retrieves a singleton event rule for the input job state for use with the metric JobState methods. + * + * @param id construct id. + * @param jobState the job state. + * @private + */ + private metricJobStateRule(id: string, jobState: JobState): events.Rule { + return this.node.tryFindChild(id) as events.Rule ?? this.onStateChange(id, jobState); + } + + /** + * Returns the job arn + * @param scope + * @param jobName + */ + protected buildJobArn(scope: constructs.Construct, jobName: string) : string { + return cdk.Stack.of(scope).formatArn({ + service: 'glue', + resource: 'job', + resourceName: jobName, + }); + } +} + +/** + * A subset of Job attributes are required for importing an existing job + * into a CDK project. This is ionly used when using fromJobAttributes + * to identify and reference the existing job. + */ +export interface JobImportAttributes { + /** + * The name of the job. + */ + readonly jobName: string; + + /** + * The IAM role assumed by Glue to run this job. + * + * @default - undefined + */ + readonly role?: iam.IRole; + +} + +/** + * JobProperties will be used to create new Glue Jobs using this L2 Construct. + */ +export interface JobProperties { + + /** + * Script Code Location (required) + * Script to run when the Glue job executes. Can be uploaded + * from the local directory structure using fromAsset + * or referenced via S3 location using fromBucket + **/ + readonly script: Code; + + /** + * IAM Role (required) + * IAM Role to use for Glue job execution + * Must be specified by the developer because the L2 doesn't have visibility + * into the actions the script(s) takes during the job execution + * The role must trust the Glue service principal (glue.amazonaws.com) + * and be granted sufficient permissions. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html + **/ + readonly role: iam.IRole; + + /** + * Name of the Glue job (optional) + * Developer-specified name of the Glue job + * @default - a name is automatically generated + **/ + readonly jobName?: string; + + /** + * Description (optional) + * Developer-specified description of the Glue job + * @default - no value + **/ + readonly description?: string; + + /** + * Number of Workers (optional) + * Number of workers for Glue to use during job execution + * @default 10 + */ + readonly numberOfWorkers?: number; + + /** + * Worker Type (optional) + * Type of Worker for Glue to use during job execution + * Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X + * @default G_1X + **/ + readonly workerType?: WorkerType; + + /** + * Max Concurrent Runs (optional) + * The maximum number of runs this Glue job can concurrently run + * + * An error is returned when this threshold is reached. The maximum value + * you can specify is controlled by a service limit. + * + * @default 1 + **/ + readonly maxConcurrentRuns?: number; + + /** + * Default Arguments (optional) + * The default arguments for every run of this Glue job, + * specified as name-value pairs. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + * for a list of reserved parameters + * @default - no arguments + **/ + readonly defaultArguments?: { [key: string]: string }; + + /** + * Connections (optional) + * List of connections to use for this Glue job + * Connections are used to connect to other AWS Service or resources within a VPC. + * + * @default [] - no connections are added to the job + **/ + readonly connections?: IConnection[]; + + /** + * Max Retries (optional) + * Maximum number of retry attempts Glue performs if the job fails + * @default 0 + **/ + readonly maxRetries?: number; + + /** + * Timeout (optional) + * The maximum time that a job run can consume resources before it is + * terminated and enters TIMEOUT status. Specified in minutes. + * @default 2880 (2 days for non-streaming) + * + **/ + readonly timeout?: cdk.Duration; + + /** + * Security Configuration (optional) + * Defines the encryption options for the Glue job + * @default - no security configuration. + **/ + readonly securityConfiguration?: ISecurityConfiguration; + + /** + * Tags (optional) + * A list of key:value pairs of tags to apply to this Glue job resourcex + * @default {} - no tags + **/ + readonly tags?: { [key: string]: string }; + + /** + * Glue Version + * The version of Glue to use to execute this job + * @default 3.0 for ETL + **/ + readonly glueVersion?: GlueVersion; + + /** + * Enables the collection of metrics for job profiling. + * + * @default - no profiling metrics emitted. + * + * @see `--enable-metrics` at https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + **/ + readonly enableProfilingMetrics? :boolean; + + /** + * Enables continuous logging with the specified props. + * + * @default - continuous logging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-continuous-logging-enable.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + **/ + readonly continuousLogging?: ContinuousLoggingProps; + +} + +/** + * A Glue Job. + * @resource AWS::Glue::Job + */ +export abstract class Job extends JobBase { + + /** + * Identifies an existing Glue Job from a subset of attributes that can + * be referenced from within another Stack or Construct. + * + * @param scope The scope creating construct (usually `this`) + * @param id The construct's id. + * @param attrs Attributes for the Glue Job we want to import + */ + public static fromJobAttributes(scope: constructs.Construct, id: string, attrs: JobImportAttributes): IJob { + class Import extends JobBase { + public readonly jobName = attrs.jobName; + public readonly jobArn = this.buildJobArn(scope, attrs.jobName); + public readonly grantPrincipal = attrs.role ?? new iam.UnknownPrincipal({ resource: this }); + } + + return new Import(scope, id); + } + + /** + * The IAM role Glue assumes to run this job. + */ + public readonly abstract role: iam.IRole; + + /** + * Check no usage of reserved arguments. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + protected checkNoReservedArgs(defaultArguments?: { [key: string]: string }) { + if (defaultArguments) { + const reservedArgs = new Set(['--debug', '--mode', '--JOB_NAME']); + Object.keys(defaultArguments).forEach((arg) => { + if (reservedArgs.has(arg)) { + throw new Error(`The ${arg} argument is reserved by Glue. Don't set it`); + } + }); + } + return defaultArguments; + } + + /** + * Setup Continuous Loggiung Properties + * @param role The IAM role to use for continuous logging + * @param props The properties for continuous logging configuration + * @returns String containing the args for the continuous logging command + */ + public setupContinuousLogging(role: iam.IRole, props: ContinuousLoggingProps | undefined) : any { + + // If the developer has explicitly disabled continuous logging return no args + if (props && !props.enabled) { + return {}; + } + + // Else we turn on continuous logging by default. Determine what log group to use. + const args: {[key: string]: string} = { + '--enable-continuous-cloudwatch-log': 'true', + }; + + if (props?.quiet) { + args['--enable-continuous-log-filter'] = 'true'; + }; + + // If the developer provided a log group, add its name to the args and update the role. + if (props?.logGroup) { + args['--continuous-log-logGroup'] = props.logGroup.logGroupName; + props.logGroup.grantWrite(role); + } + + if (props?.logStreamPrefix) { + args['--continuous-log-logStreamPrefix'] = props.logStreamPrefix; + } + + if (props?.conversionPattern) { + args['--continuous-log-conversionPattern'] = props.conversionPattern; + } + + return args; + } + + protected codeS3ObjectUrl(code: Code) { + const s3Location = code.bind(this, this.role).s3Location; + return `s3://${s3Location.bucketName}/${s3Location.objectKey}`; + } + +} + +/** + * Create a CloudWatch Metric that's based on Glue Job events + * {@see https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types} + * The metric has namespace = 'AWS/Events', metricName = 'TriggeredRules' and RuleName = rule.ruleName dimension. + * + * @param rule for use in setting RuleName dimension value + * @param props metric properties + */ +function metricRule(rule: events.IRule, props?: cloudwatch.MetricOptions): cloudwatch.Metric { + return new cloudwatch.Metric({ + namespace: 'AWS/Events', + metricName: 'TriggeredRules', + dimensionsMap: { RuleName: rule.ruleName }, + statistic: cloudwatch.Statistic.SUM, + ...props, + }).attachTo(rule); +} + diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-etl-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-etl-job.ts new file mode 100644 index 0000000000000..b5075cc90a4c2 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-etl-job.ts @@ -0,0 +1,191 @@ +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, PythonVersion, WorkerType } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; +import { Code } from '../code'; + +/** + * PySpark ETL Jobs class + * ETL jobs support pySpark and Scala languages, for which there are separate + * but similar constructors. ETL jobs default to the G2 worker type, but you + * can override this default with other supported worker type values + * (G1, G2, G4 and G8). ETL jobs defaults to Glue version 4.0, which you can + * override to 3.0. The following ETL features are enabled by default: + * —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log. + * You can find more details about version, worker type and other features + * in Glue's public documentation. + */ + +/** + * Properties for creating a Python Spark ETL job + */ +export interface PySparkEtlJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Extra Python Files S3 URL (optional) + * S3 URL where additional python dependencies are located + * @default - no extra files + */ + readonly extraPythonFiles?: Code[]; + + /** + * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. + * + * @default - no extra files specified. + * + * @see `--extra-files` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraFiles?: Code[]; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * A Python Spark ETL Glue Job + */ +export class PySparkEtlJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * PySparkEtlJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: PySparkEtlJobProps) { + + super(scope, id, { + physicalName: props.jobName, + }); + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const execuatbleArgs = this.executableArguments(props); + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...execuatbleArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.ETL, + scriptLocation: this.codeS3ObjectUrl(props.script), + pythonVersion: PythonVersion.THREE_NINE, + }, + glueVersion: props.glueVersion ?? GlueVersion.V4_0, + workerType: props.workerType ?? WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: PySparkEtlJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.PYTHON; + + if (props.extraPythonFiles && props.extraPythonFiles.length > 0) { + args['--extra-py-files'] = props.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + if (props.extraFiles && props.extraFiles.length > 0) { + args['--extra-files'] = props.extraFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + + return args; + } + + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-flex-etl-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-flex-etl-job.ts new file mode 100644 index 0000000000000..a1990de68ca89 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-flex-etl-job.ts @@ -0,0 +1,199 @@ +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, PythonVersion, WorkerType, ExecutionClass } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; +import * as cdk from 'aws-cdk-lib/core'; +import { Code } from '../code'; + +/** + * Flex Jobs class + * + * Flex jobs supports Python and Scala language. + * The flexible execution class is appropriate for non-urgent jobs such as + * pre-production jobs, testing, and one-time data loads. + * Flexible job runs are supported for jobs using AWS Glue version 3.0 or later and G.1X or + * G.2X worker types but will default to the latest version of Glue (currently Glue 3.0.) + * + * Similar to ETL, we’ll enable these features: —enable-metrics, —enable-spark-ui, + * —enable-continuous-cloudwatch-log + * + */ + +export interface PySparkFlexEtlJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Specifies configuration properties of a notification (optional). + * After a job run starts, the number of minutes to wait before sending a job run delay notification. + * @default - undefined + */ + readonly notifyDelayAfter?: cdk.Duration; + + /** + * Additional Python files that AWS Glue adds to the Python path before executing your script. + * + * @default - no extra python files specified. + * + * @see `--extra-py-files` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraPythonFiles?: Code[]; + + /** + * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. + * + * @default - no extra files specified. + * + * @see `--extra-files` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraFiles?: Code[]; + +} + +/** + * A Python Spark ETL Glue Job + */ +export class PySparkFlexEtlJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * PySparkFlexEtlJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: PySparkFlexEtlJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const execuatbleArgs = this.executableArguments(props); + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...execuatbleArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.ETL, + scriptLocation: this.codeS3ObjectUrl(props.script), + pythonVersion: PythonVersion.THREE, + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V3_0, + workerType: props.workerType ? props.workerType : WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.maxRetries, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + notificationProperty: props.notifyDelayAfter ? { notifyDelayAfter: props.notifyDelayAfter.toMinutes() } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + executionClass: ExecutionClass.FLEX, + jobRunQueuingEnabled: false, + defaultArguments, + + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + *Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: PySparkFlexEtlJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.PYTHON; + + if (props.extraPythonFiles && props.extraPythonFiles.length > 0) { + args['--extra-py-files'] = props.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + if (props.extraFiles && props.extraFiles.length > 0) { + args['--extra-files'] = props.extraFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + + return args; + } + + /** + * Set the arguments for sparkUI with best practices enabled by default + * + * @param sparkUiProps, role + * @returns An array of arguments for enabling sparkUI + */ + + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-streaming-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-streaming-job.ts new file mode 100644 index 0000000000000..e70fd6dc6d410 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/pyspark-streaming-job.ts @@ -0,0 +1,195 @@ +/** + * Python Spark Streaming Jobs class + * + * A Streaming job is similar to an ETL job, except that it performs ETL on data streams + * using the Apache Spark Structured Streaming framework. + * These jobs will default to use Python 3.9. + * + * Similar to ETL jobs, streaming job supports Scala and Python languages. Similar to ETL, + * it supports G1 and G2 worker type and 2.0, 3.0 and 4.0 version. We’ll default to G2 worker + * and 4.0 version for streaming jobs which developers can override. + * We will enable —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log. + * + * RFC : https://github.com/aws/aws-cdk-rfcs/blob/main/text/0497-glue-l2-construct.md + */ + +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, PythonVersion, WorkerType } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; +import { Code } from '../code'; + +/** + * Properties for creating a Python Spark ETL job + */ +export interface PySparkStreamingJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Extra Python Files S3 URL (optional) + * S3 URL where additional python dependencies are located + * @default - no extra files + */ + + readonly extraPythonFiles?: Code[]; + /** + * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. + * + * @default - no extra files specified. + * + * @see `--extra-files` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraFiles?: Code[]; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; + +} + +/** + * A Python Spark Streaming Glue Job + */ +export class PySparkStreamingJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * pySparkStreamingJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: PySparkStreamingJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const executableArgs = this.executableArguments(props); + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...executableArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.STREAMING, + scriptLocation: this.codeS3ObjectUrl(props.script), + pythonVersion: PythonVersion.THREE, + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V4_0, + workerType: props.workerType ? props.workerType : WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: PySparkStreamingJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.PYTHON; + + if (props.extraPythonFiles && props.extraPythonFiles.length > 0) { + args['--extra-py-files'] = props.extraPythonFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + if (props.extraFiles && props.extraFiles.length > 0) { + args['--extra-files'] = props.extraFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + + return args; + } + + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/python-shell-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/python-shell-job.ts new file mode 100644 index 0000000000000..c0249788fad2d --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/python-shell-job.ts @@ -0,0 +1,135 @@ +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, PythonVersion, MaxCapacity, JobLanguage } from '../constants'; + +/** + * Python Shell Jobs class + * + * A Python shell job runs Python scripts as a shell and supports a Python version that + * depends on the AWS Glue version you are using. + * This can be used to schedule and run tasks that don't require an Apache Spark environment. + * + */ + +/** + * Properties for creating a Python Shell job + */ +export interface PythonShellJobProps extends JobProperties { + /** + * Python Version + * The version of Python to use to execute this job + * @default 3.9 for Shell Jobs + **/ + readonly pythonVersion?: PythonVersion; + + /** + * The total number of DPU to assign to the Python Job + * @default 0.0625 + */ + readonly maxCapacity?: MaxCapacity; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * A Python Shell Glue Job + */ +export class PythonShellJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * PythonShellJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: PythonShellJobProps) { + super(scope, id, { physicalName: props.jobName }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const executableArgs = this.executableArguments(props); + + // Combine command line arguments into a single line item + const defaultArguments = { + ...executableArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.PYTHON_SHELL, + scriptLocation: this.codeS3ObjectUrl(props.script), + pythonVersion: props.pythonVersion ? props.pythonVersion : PythonVersion.THREE_NINE, + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V3_0, + maxCapacity: props.maxCapacity ? props.maxCapacity : MaxCapacity.DPU_1_16TH, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries ? props.maxRetries : 0, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: PythonShellJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.PYTHON; + + //If no Python version set (default 3.9) or the version is set to 3.9 then set library-set argument + if (!props.pythonVersion || props.pythonVersion == PythonVersion.THREE_NINE) { + //Selecting this option includes common libraries for Python 3.9 + args['library-set'] = 'analytics'; + } + + return args; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/ray-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/ray-job.ts new file mode 100644 index 0000000000000..be3038facb57e --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/ray-job.ts @@ -0,0 +1,117 @@ +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, WorkerType, Runtime } from '../constants'; + +/** + * Ray Jobs class + * + * Glue Ray jobs use worker type Z.2X and Glue version 4.0. + * These are not overrideable since these are the only configuration that + * Glue Ray jobs currently support. The runtime defaults to Ray2.4 and min + * workers defaults to 3. + */ + +/** + * Properties for creating a Ray Glue job + */ +export interface RayJobProps extends JobProperties { + /** + * Sets the Ray runtime environment version + * + * @default - Runtime version will default to Ray2.4 + */ + readonly runtime?: Runtime; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * A Ray Glue Job + */ +export class RayJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * RayJob constructor + * + * @param scope + * @param id + * @param props + */ + + constructor(scope: Construct, id: string, props: RayJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + this.jobName = props.jobName ?? ''; + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Combine command line arguments into a single line item + const defaultArguments = { + ...this.checkNoReservedArgs(props.defaultArguments), + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + }; + + if (props.workerType && props.workerType !== WorkerType.Z_2X) { + throw new Error('Ray jobs only support Z.2X worker type'); + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.RAY, + scriptLocation: this.codeS3ObjectUrl(props.script), + runtime: props.runtime ? props.runtime : Runtime.RAY_TWO_FOUR, + }, + glueVersion: GlueVersion.V4_0, + workerType: props.workerType ? props.workerType : WorkerType.Z_2X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers: 3, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-etl-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-etl-job.ts new file mode 100644 index 0000000000000..d33cbc172b33b --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-etl-job.ts @@ -0,0 +1,198 @@ +/** + * Spark ETL Jobs class + * ETL jobs support pySpark and Scala languages, for which there are separate + * but similar constructors. ETL jobs default to the G2 worker type, but you + * can override this default with other supported worker type values + * (G1, G2, G4 and G8). ETL jobs defaults to Glue version 4.0, which you can + * override to 3.0. The following ETL features are enabled by default: + * —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log. + * You can find more details about version, worker type and other features + * in Glue's public documentation. + * + * RFC: https://github.com/aws/aws-cdk-rfcs/blob/main/text/0497-glue-l2-construct.md + * + */ + +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, WorkerType } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; +import { Code } from '../code'; + +/** + * Properties for creating a Scala Spark ETL job + */ +export interface ScalaSparkEtlJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Class name (required for Scala scripts) + * Package and class name for the entry point of Glue job execution for + * Java scripts + **/ + readonly className: string; + + /** + * Extra Jars S3 URL (optional) + * S3 URL where additional jar dependencies are located + * @default - no extra jar files + */ + readonly extraJars?: Code[]; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * A Scala Spark ETL Glue Job + */ +export class ScalaSparkEtlJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * ScalaSparkEtlJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: ScalaSparkEtlJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const execuatbleArgs = this.executableArguments(props); + + // Mandatory className argument + if (props.className === undefined) { + throw new Error('className must be set for Scala ETL Jobs'); + } + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...execuatbleArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + if ((!props.workerType && props.numberOfWorkers !== undefined) || (props.workerType && props.numberOfWorkers === undefined)) { + throw new Error('Both workerType and numberOfWorkers must be set'); + } + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.ETL, + scriptLocation: this.codeS3ObjectUrl(props.script), + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V4_0, + workerType: props.workerType ? props.workerType : WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: ScalaSparkEtlJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.SCALA; + args['--class'] = props.className!; + + if (props.extraJars && props.extraJars?.length > 0) { + args['--extra-jars'] = props.extraJars.map(code => this.codeS3ObjectUrl(code)).join(','); + } + + return args; + } + + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-flex-etl-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-flex-etl-job.ts new file mode 100644 index 0000000000000..9a4eae9f9ea37 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-flex-etl-job.ts @@ -0,0 +1,235 @@ +/** + * Spark ETL Jobs class + * ETL jobs support pySpark and Scala languages, for which there are separate + * but similar constructors. ETL jobs default to the G2 worker type, but you + * can override this default with other supported worker type values + * (G1, G2, G4 and G8). ETL jobs defaults to Glue version 4.0, which you can + * override to 3.0. The following ETL features are enabled by default: + * —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log. + * You can find more details about version, worker type and other features + * in Glue's public documentation. + */ + +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, WorkerType, ExecutionClass } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; +import * as cdk from 'aws-cdk-lib/core'; +import { Code } from '../code'; + +/** + * Flex Jobs class + * + * Flex jobs supports Python and Scala language. + * The flexible execution class is appropriate for non-urgent jobs such as + * pre-production jobs, testing, and one-time data loads. + * Flexible job runs are supported for jobs using AWS Glue version 3.0 or later and G.1X or + * G.2X worker types but will default to the latest version of Glue (currently Glue 3.0.) + * + * Similar to ETL, we’ll enable these features: —enable-metrics, —enable-spark-ui, + * —enable-continuous-cloudwatch-log + * + */ + +export interface ScalaSparkFlexEtlJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Specifies configuration properties of a notification (optional). + * After a job run starts, the number of minutes to wait before sending a job run delay notification. + * @default - undefined + */ + readonly notifyDelayAfter?: cdk.Duration; + + /** + * The fully qualified Scala class name that serves as the entry point for the job. + * + * @see `--class` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly className: string; + + /** + * Additional Java .jar files that AWS Glue adds to the Java classpath before executing your script. + * Only individual files are supported, directories are not supported. + * + * @default [] - no extra jars are added to the classpath + * + * @see `--extra-jars` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraJars?: Code[]; + + /** + * Setting this value to true prioritizes the customer's extra JAR files in the classpath. + * + * @default false - priority is not given to user-provided jars + * + * @see `--user-jars-first` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraJarsFirst?: boolean; + + /** + * Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. + * Only individual files are supported, directories are not supported. + * + * @default [] - no extra files are copied to the working directory + * + * @see `--extra-files` in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly extraFiles?: Code[]; + +} + +/** + * A Scala Spark ETL Glue Job + */ +export class ScalaSparkFlexEtlJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * ScalaSparkFlexEtlJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: ScalaSparkFlexEtlJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const execuatbleArgs = this.executableArguments(props); + + if (props.className === undefined) { + throw new Error('className must be set for Scala ETL Jobs'); + } + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...execuatbleArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.ETL, + scriptLocation: this.codeS3ObjectUrl(props.script), + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V3_0, + workerType: props.workerType ? props.workerType : WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.maxRetries, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + notificationProperty: props.notifyDelayAfter ? { notifyDelayAfter: props.notifyDelayAfter.toMinutes() } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + executionClass: ExecutionClass.FLEX, + jobRunQueuingEnabled: false, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: ScalaSparkFlexEtlJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.SCALA; + args['--class'] = props.className!; + + if (props.extraJars && props.extraJars?.length > 0) { + args['--extra-jars'] = props.extraJars.map(code => this.codeS3ObjectUrl(code)).join(','); + } + + if (props.extraFiles && props.extraFiles.length > 0) { + args['--extra-files'] = props.extraFiles.map(code => this.codeS3ObjectUrl(code)).join(','); + } + if (props.extraJarsFirst) { + args['--user-jars-first'] = 'true'; + } + return args; + + } + + /** + * Set the arguments for sparkUI with best practices enabled by default + * + * @param sparkUiProps, role + * @returns An array of arguments for enabling sparkUI + */ + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-streaming-job.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-streaming-job.ts new file mode 100644 index 0000000000000..0467479dc6bd2 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/scala-spark-streaming-job.ts @@ -0,0 +1,193 @@ +/** + * Scala Streaming Jobs class + * + * A Streaming job is similar to an ETL job, except that it performs ETL on data streams + * using the Apache Spark Structured Streaming framework. + * These jobs will default to use Python 3.9. + * + * Similar to ETL jobs, streaming job supports Scala and Python languages. Similar to ETL, + * it supports G1 and G2 worker type and 2.0, 3.0 and 4.0 version. We’ll default to G2 worker + * and 4.0 version for streaming jobs which developers can override. + * We will enable —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log. + * + * RFC: https://github.com/aws/aws-cdk-rfcs/blob/main/text/0497-glue-l2-construct.md + */ + +import { CfnJob } from 'aws-cdk-lib/aws-glue'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import { Bucket, BucketEncryption } from 'aws-cdk-lib/aws-s3'; +import { Job, JobProperties } from './job'; +import { Construct } from 'constructs'; +import { JobType, GlueVersion, JobLanguage, WorkerType } from '../constants'; +import { SparkUIProps, SparkUILoggingLocation, validateSparkUiPrefix, cleanSparkUiPrefixForGrant } from './spark-ui-utils'; + +/** + * Properties for creating a Scala Spark ETL job + */ +export interface ScalaSparkStreamingJobProps extends JobProperties { + + /** + * Enables the Spark UI debugging and monitoring with the specified props. + * + * @default - Spark UI debugging and monitoring is disabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + readonly sparkUI?: SparkUIProps; + + /** + * Class name (required for Scala scripts) + * Package and class name for the entry point of Glue job execution for + * Java scripts + **/ + readonly className: string; + + /** + * Extra Jars S3 URL (optional) + * S3 URL where additional jar dependencies are located + * @default - no extra jar files + */ + readonly extraJars?: string[]; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * A Scala Spark Streaming Glue Job + */ +export class ScalaSparkStreamingJob extends Job { + + // Implement abstract Job attributes + public readonly jobArn: string; + public readonly jobName: string; + public readonly role: iam.IRole; + public readonly grantPrincipal: iam.IPrincipal; + + /** + * The Spark UI logs location if Spark UI monitoring and debugging is enabled. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ + public readonly sparkUILoggingLocation?: SparkUILoggingLocation; + + /** + * ScalaSparkStreamingJob constructor + * + * @param scope + * @param id + * @param props + */ + constructor(scope: Construct, id: string, props: ScalaSparkStreamingJobProps) { + super(scope, id, { + physicalName: props.jobName, + }); + + // Set up role and permissions for principal + this.role = props.role, { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], + }; + this.grantPrincipal = this.role; + + // Enable SparkUI by default as a best practice + const sparkUIArgs = props.sparkUI?.bucket ? this.setupSparkUI(this.role, props.sparkUI) : undefined; + this.sparkUILoggingLocation = sparkUIArgs?.location; + + // Enable CloudWatch metrics and continuous logging by default as a best practice + const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); + const profilingMetricsArgs = { '--enable-metrics': '' }; + const observabilityMetricsArgs = { '--enable-observability-metrics': 'true' }; + + // Gather executable arguments + const executableArgs = this.executableArguments(props); + + // Mandatory className argument + if (props.className === undefined) { + throw new Error('className must be set for Scala ETL Jobs'); + } + + // Conbine command line arguments into a single line item + const defaultArguments = { + ...executableArgs, + ...continuousLoggingArgs, + ...profilingMetricsArgs, + ...observabilityMetricsArgs, + ...sparkUIArgs?.args, + ...this.checkNoReservedArgs(props.defaultArguments), + }; + + if ((!props.workerType && props.numberOfWorkers !== undefined) || (props.workerType && props.numberOfWorkers === undefined)) { + throw new Error('Both workerType and numberOfWorkers must be set'); + } + + const jobResource = new CfnJob(this, 'Resource', { + name: props.jobName, + description: props.description, + role: this.role.roleArn, + command: { + name: JobType.STREAMING, + scriptLocation: this.codeS3ObjectUrl(props.script), + }, + glueVersion: props.glueVersion ? props.glueVersion : GlueVersion.V4_0, + workerType: props.workerType ? props.workerType : WorkerType.G_1X, + numberOfWorkers: props.numberOfWorkers ? props.numberOfWorkers : 10, + maxRetries: props.jobRunQueuingEnabled ? 0 : props.maxRetries, + jobRunQueuingEnabled: props.jobRunQueuingEnabled ? props.jobRunQueuingEnabled : false, + executionProperty: props.maxConcurrentRuns ? { maxConcurrentRuns: props.maxConcurrentRuns } : undefined, + timeout: props.timeout?.toMinutes(), + connections: props.connections ? { connections: props.connections.map((connection) => connection.connectionName) } : undefined, + securityConfiguration: props.securityConfiguration?.securityConfigurationName, + tags: props.tags, + defaultArguments, + }); + + const resourceName = this.getResourceNameAttribute(jobResource.ref); + this.jobArn = this.buildJobArn(this, resourceName); + this.jobName = resourceName; + } + + /** + * Set the executable arguments with best practices enabled by default + * + * @param props + * @returns An array of arguments for Glue to use on execution + */ + private executableArguments(props: ScalaSparkStreamingJobProps) { + const args: { [key: string]: string } = {}; + args['--job-language'] = JobLanguage.SCALA; + args['--class'] = props.className!; + + return args; + } + + private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { + + validateSparkUiPrefix(sparkUiProps.prefix); + const bucket = sparkUiProps.bucket ?? new Bucket(this, 'SparkUIBucket', { enforceSSL: true, encryption: BucketEncryption.S3_MANAGED }); + bucket.grantReadWrite(role, cleanSparkUiPrefixForGrant(sparkUiProps.prefix)); + const args = { + '--enable-spark-ui': 'true', + '--spark-event-logs-path': bucket.s3UrlForObject(sparkUiProps.prefix).replace(/\/?$/, '/'), // path will always end with a slash + }; + + return { + location: { + prefix: sparkUiProps.prefix, + bucket, + }, + args, + }; + } +} diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/jobs/spark-ui-utils.ts b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/spark-ui-utils.ts new file mode 100644 index 0000000000000..20d3551a309fe --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/jobs/spark-ui-utils.ts @@ -0,0 +1,84 @@ +import { IBucket } from 'aws-cdk-lib/aws-s3'; +import { Token } from 'aws-cdk-lib'; +import { EOL } from 'os'; + +/** + * Properties for enabling Spark UI monitoring feature for Spark-based Glue jobs. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ +export interface SparkUIProps { + + /** + * The bucket where the Glue job stores the logs. + * + * @default a new bucket will be created. + */ + readonly bucket?: IBucket; + + /** + * The path inside the bucket (objects prefix) where the Glue job stores the logs. + * Use format `'/foo/bar'` + * + * @default - the logs will be written at the root of the bucket + */ + readonly prefix?: string; + + /** + * Specifies whether job run queuing is enabled for the job runs for this job. + * A value of true means job run queuing is enabled for the job runs. + * If false or not populated, the job runs will not be considered for queueing. + * If this field does not match the value set in the job run, then the value from + * the job run field will be used. This property must be set to false for flex jobs. + * If this property is enabled, maxRetries must be set to zero. + * + * @default - no job run queuing + */ + readonly jobRunQueuingEnabled?: boolean; +} + +/** + * The Spark UI logging location. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html + * @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html + */ +export interface SparkUILoggingLocation { + /** + * The bucket where the Glue job stores the logs. + */ + readonly bucket: IBucket; + + /** + * The path inside the bucket (objects prefix) where the Glue job stores the logs. + * + * @default '/' - the logs will be written at the root of the bucket + */ + readonly prefix?: string; +} + +export function validateSparkUiPrefix(prefix?: string): void { + if (!prefix || Token.isUnresolved(prefix)) { + // skip validation if prefix is not specified or is a token + return; + } + + const errors: string[] = []; + + if (!prefix.startsWith('/')) { + errors.push('Prefix must begin with \'/\''); + } + + if (prefix.endsWith('/')) { + errors.push('Prefix must not end with \'/\''); + } + + if (errors.length > 0) { + throw new Error(`Invalid prefix format (value: ${prefix})${EOL}${errors.join(EOL)}`); + } +} + +export function cleanSparkUiPrefixForGrant(prefix?: string): string | undefined { + return prefix !== undefined ? prefix.slice(1) + '/*' : undefined; +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/triggers/trigger-options.ts b/packages/@aws-cdk/aws-glue-alpha/lib/triggers/trigger-options.ts new file mode 100644 index 0000000000000..2157249295724 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/triggers/trigger-options.ts @@ -0,0 +1,238 @@ +/** + * Triggers + * + * In AWS Glue, developers can use workflows to create and visualize complex extract, + * transform, and load (ETL) activities involving multiple crawlers, jobs, and triggers. + * + */ + +import * as cdk from 'aws-cdk-lib/core'; +import { JobState, CrawlerState, ConditionLogicalOperator, PredicateLogical } from '../constants'; +import { IJob } from '../jobs/job'; // Use IJob interface instead of concrete class +import { CfnCrawler } from 'aws-cdk-lib/aws-glue'; +import { ISecurityConfiguration } from '../security-configuration'; +import * as events from 'aws-cdk-lib/aws-events'; + +/** + * Represents a trigger action. + */ +export interface Action { + /** + * The job to be executed. + * + * @default - no job is executed + */ + readonly job?: IJob; + + /** + * The job arguments used when this trigger fires. + * + * @default - no arguments are passed to the job + */ + readonly arguments?: { [key: string]: string }; + + /** + * The job run timeout. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. + * + * @default - the default timeout value set in the job definition + */ + readonly timeout?: cdk.Duration; + + /** + * The `SecurityConfiguration` to be used with this action. + * + * @default - no security configuration is used + */ + readonly securityConfiguration?: ISecurityConfiguration; + + /** + * The name of the crawler to be used with this action. + * + * @default - no crawler is used + */ + readonly crawler?: CfnCrawler; +} + +/** + * Represents a trigger schedule. + */ +export class TriggerSchedule { + /** + * Creates a new TriggerSchedule instance with a cron expression. + * + * @param options The cron options for the schedule. + * @returns A new TriggerSchedule instance. + */ + public static cron(options: events.CronOptions): TriggerSchedule { + return new TriggerSchedule(events.Schedule.cron(options).expressionString); + } + + /** + * Creates a new TriggerSchedule instance with a custom expression. + * + * @param expression The custom expression for the schedule. + * @returns A new TriggerSchedule instance. + */ + public static expression(expression: string): TriggerSchedule { + return new TriggerSchedule(expression); + } + + /** + * @param expressionString The expression string for the schedule. + */ + private constructor(public readonly expressionString: string) {} +} + +/** + * Represents a trigger predicate. + */ +export interface Predicate { + /** + * The logical operator to be applied to the conditions. + * + * @default - ConditionLogical.AND if multiple conditions are provided, no logical operator if only one condition + */ + readonly logical?: PredicateLogical; + + /** + * A list of the conditions that determine when the trigger will fire. + * + * @default - no conditions are provided + */ + readonly conditions?: Condition[]; +} + +/** + * Represents a trigger condition. + */ +export interface Condition { + /** + * The logical operator for the condition. + * + * @default ConditionLogicalOperator.EQUALS + */ + readonly logicalOperator?: ConditionLogicalOperator; + + /** + * The job to which this condition applies. + * + * @default - no job is specified + */ + readonly job?: IJob; + + /** + * The condition job state. + * + * @default - no job state is specified + */ + readonly state?: JobState; + + /** + * The name of the crawler to which this condition applies. + * + * @default - no crawler is specified + */ + readonly crawlerName?: string; + + /** + * The condition crawler state. + * + * @default - no crawler state is specified + */ + readonly crawlState?: CrawlerState; +} + +/** + * Represents event trigger batch condition. + */ +export interface EventBatchingCondition { + /** + * Number of events that must be received from Amazon EventBridge before EventBridge event trigger fires. + */ + readonly batchSize: number; + + /** + * Window of time in seconds after which EventBridge event trigger fires. + * + * @default - 900 seconds + */ + readonly batchWindow?: cdk.Duration; +} + +/** + * Properties for configuring a Glue Trigger + */ +export interface TriggerOptions { + /** + * A name for the trigger. + * + * @default - no name is provided + */ + readonly name?: string; + + /** + * A description for the trigger. + * + * @default - no description + */ + readonly description?: string; + + /** + * The actions initiated by this trigger. + */ + readonly actions: Action[]; +} + +/** + * Properties for configuring an on-demand Glue Trigger. + */ +export interface OnDemandTriggerOptions extends TriggerOptions {} + +/** + * Properties for configuring a daily-scheduled Glue Trigger. + */ +export interface DailyScheduleTriggerOptions extends TriggerOptions { + /** + * Whether to start the trigger on creation or not. + * + * @default - false + */ + readonly startOnCreation?: boolean; +} + +/** + * Properties for configuring a weekly-scheduled Glue Trigger. + */ +export interface WeeklyScheduleTriggerOptions extends DailyScheduleTriggerOptions {} + +/** + * Properties for configuring a custom-scheduled Glue Trigger. + */ +export interface CustomScheduledTriggerOptions extends WeeklyScheduleTriggerOptions { + /** + * The custom schedule for the trigger. + */ + readonly schedule: TriggerSchedule; +} + +/** + * Properties for configuring an Event Bridge based Glue Trigger. + */ +export interface NotifyEventTriggerOptions extends TriggerOptions { + /** + * Batch condition for the trigger. + * + * @default - no batch condition + */ + readonly eventBatchingCondition?: EventBatchingCondition; +} + +/** + * Properties for configuring a Condition (Predicate) based Glue Trigger. + */ +export interface ConditionalTriggerOptions extends DailyScheduleTriggerOptions{ + /** + * The predicate for the trigger. + */ + readonly predicate: Predicate; +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/lib/triggers/workflow.ts b/packages/@aws-cdk/aws-glue-alpha/lib/triggers/workflow.ts new file mode 100644 index 0000000000000..537fd9b51d1bc --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/lib/triggers/workflow.ts @@ -0,0 +1,433 @@ +/** + * This module defines a construct for creating and managing AWS Glue Workflows and Triggers. + * + * AWS Glue Workflows are orchestration services that allow you to create, manage, and monitor complex extract, transform, and load (ETL) activities involving multiple crawlers, jobs, and triggers. Workflows are designed to allow you to manage interdependent jobs and crawlers as a single unit, making it easier to orchestrate and monitor complex ETL pipelines. + * + * Triggers are used to initiate an AWS Glue Workflow. You can configure different types of triggers, such as on-demand, scheduled, event-based, or conditional triggers, to start your Workflow based on specific conditions or events. + * + * @see https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html + * @see https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html + * + * ## Usage Example + * + * ```typescript + * import * as cdk from 'aws-cdk-lib'; + * import * as glue from 'aws-glue-cdk-lib'; + * + * const app = new cdk.App(); + * const stack = new cdk.Stack(app, 'TestStack'); + * + * // Create a Glue Job + * const job = new glue.Job(stack, 'TestJob', { + * // Job properties + * }); + * + * // Create a Glue Workflow + * const workflow = new glue.Workflow(stack, 'TestWorkflow', { + * // Workflow properties + * }); + * + * // Add an on-demand trigger to the Workflow + * workflow.addOnDemandTrigger('OnDemandTrigger', { + * actions: [{ job: job }], + * }); + * ``` + */ + +import * as cdk from 'aws-cdk-lib/core'; +import * as constructs from 'constructs'; +import { CfnWorkflow, CfnTrigger } from 'aws-cdk-lib/aws-glue'; +import { + ConditionLogicalOperator, + PredicateLogical, +} from '../constants'; +import { + Action, + TriggerSchedule, + OnDemandTriggerOptions, + WeeklyScheduleTriggerOptions, + DailyScheduleTriggerOptions, + CustomScheduledTriggerOptions, + NotifyEventTriggerOptions, + ConditionalTriggerOptions, +} from './trigger-options'; + +/** + * The base interface for Glue Workflow + * + * @see {@link Workflow} + * @see https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html + */ +export interface IWorkflow extends cdk.IResource { + /** + * The name of the workflow + * @attribute + */ + readonly workflowName: string; + + /** + * The ARN of the workflow + * @attribute + */ + readonly workflowArn: string; + + /** + * Add an on-demand trigger to the workflow + */ + addOnDemandTrigger(id: string, options: OnDemandTriggerOptions): CfnTrigger; + + /** + * Add an daily-scheduled trigger to the workflow + */ + addDailyScheduledTrigger(id: string, options: DailyScheduleTriggerOptions): CfnTrigger; + + /** + * Add an weekly-scheduled trigger to the workflow + */ + addWeeklyScheduledTrigger(id: string, options: WeeklyScheduleTriggerOptions): CfnTrigger; + + /** + * Add an custom-scheduled trigger to the workflow + */ + addCustomScheduledTrigger(id: string, options: CustomScheduledTriggerOptions): CfnTrigger; +} + +/** + * Properties for importing a Workflow using its attributes + */ +export interface WorkflowAttributes { + /** + * The name of the workflow to import + */ + readonly workflowName: string; + /** + * The ARN of the workflow to import + * + * @default - derived from the workflow name + */ + readonly workflowArn?: string; +} + +/** + * Properties for defining a Workflow + */ +export interface WorkflowProps { + /** + * Name of the workflow + * + * @default - a name will be generated + */ + readonly workflowName?: string; + + /** + * A description of the workflow + * + * @default - no description + */ + readonly description?: string; + + /** + * A map of properties to use when this workflow is executed + * + * @default - no default run properties + */ + readonly defaultRunProperties?: { [key: string]: string }; + + /** + * The maximum number of concurrent runs allowed for the workflow + * + * @default - no limit + */ + readonly maxConcurrentRuns?: number; +} + +/** + * Base abstract class for Workflow + * + * @see https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html + */ +export abstract class WorkflowBase extends cdk.Resource implements IWorkflow { + /** + * Extract workflowName from arn + */ + protected static extractNameFromArn(scope: constructs.Construct, workflowArn: string): string { + return cdk.Stack.of(scope).splitArn( + workflowArn, + cdk.ArnFormat.SLASH_RESOURCE_NAME).resourceName!; + } + + public abstract readonly workflowName: string; + public abstract readonly workflowArn: string; + + /** + * Add an on-demand trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided. + * @returns The created CfnTrigger resource. + */ + public addOnDemandTrigger(id: string, options: OnDemandTriggerOptions): CfnTrigger { + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'ON_DEMAND', + actions: options.actions?.map(this.renderAction), + description: options.description || undefined, + }); + + return trigger; + } + + /** + * Add a daily-scheduled trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided. + * @returns The created CfnTrigger resource. + */ + public addDailyScheduledTrigger(id: string, options: DailyScheduleTriggerOptions): CfnTrigger { + const dailySchedule = TriggerSchedule.cron({ + minute: '0', + hour: '0', + }); + + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'SCHEDULED', + actions: options.actions?.map(this.renderAction), + schedule: dailySchedule.expressionString, + startOnCreation: options.startOnCreation ?? false, + }); + + return trigger; + } + + /** + * Add a weekly-scheduled trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided. + * @returns The created CfnTrigger resource. + */ + public addWeeklyScheduledTrigger(id: string, options: WeeklyScheduleTriggerOptions): CfnTrigger { + const weeklySchedule = TriggerSchedule.cron({ + minute: '0', + hour: '0', + weekDay: 'SUN', + }); + + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'SCHEDULED', + actions: options.actions?.map(this.renderAction), + schedule: weeklySchedule.expressionString, + startOnCreation: options.startOnCreation ?? false, + }); + + return trigger; + } + + /** + * Add a custom-scheduled trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided. + * @returns The created CfnTrigger resource. + */ + public addCustomScheduledTrigger(id: string, options: CustomScheduledTriggerOptions): CfnTrigger { + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'SCHEDULED', + actions: options.actions?.map(this.renderAction), + schedule: options.schedule.expressionString, + startOnCreation: options.startOnCreation ?? false, + }); + + return trigger; + } + + /** + * Add an Event Bridge based trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided. + * @returns The created CfnTrigger resource. + */ + public addNotifyEventTrigger(id: string, options: NotifyEventTriggerOptions): CfnTrigger { + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'EVENT', + actions: options.actions?.map(this.renderAction), + eventBatchingCondition: this.renderEventBatchingCondition(options), + description: options.description ?? undefined, + }); + + return trigger; + } + + /** + * Add a Condition (Predicate) based trigger to the workflow. + * + * @param id The id of the trigger. + * @param options Additional options for the trigger. + * @throws If both job and crawler are provided, or if neither job nor crawler is provided for any condition. + * @throws If a job is provided without a job state, or if a crawler is provided without a crawler state for any condition. + * @returns The created CfnTrigger resource. + */ + public addconditionalTrigger(id: string, options: ConditionalTriggerOptions): CfnTrigger { + const trigger = new CfnTrigger(this, id, { + ...options, + workflowName: this.workflowName, + type: 'CONDITIONAL', + actions: options.actions?.map(this.renderAction), + predicate: this.renderPredicate(options), + eventBatchingCondition: this.renderEventBatchingCondition(options), + description: options.description ?? undefined, + }); + + return trigger; + } + + private renderAction(action: Action): CfnTrigger.ActionProperty { + // Validate that either job or crawler is provided, but not both + if (!action.job && !action.crawler) { + throw new Error('You must provide either a job or a crawler for the action.'); + } else if (action.job && action.crawler) { + throw new Error('You cannot provide both a job and a crawler for the action.'); + } + + return { + jobName: action.job?.jobName, + arguments: action.arguments, + timeout: action.timeout?.toMinutes(), + securityConfiguration: action.securityConfiguration?.securityConfigurationName, + crawlerName: action.crawler?.name, + }; + } + + private renderPredicate(props: ConditionalTriggerOptions): CfnTrigger.PredicateProperty { + const conditions = props.predicate.conditions?.map(condition => { + // Validate that either job or crawler is provided, but not both + if (!condition.job && !condition.crawlerName) { + throw new Error('You must provide either a job or a crawler for the condition.'); + } else if (condition.job && condition.crawlerName) { + throw new Error('You cannot provide both a job and a crawler for the condition.'); + } + + // Validate that if job is provided, job state is also provided + if (condition.job && !condition.state) { + throw new Error('If you provide a job for the condition, you must also provide a job state.'); + } + + // Validate that if crawler is provided, crawler state is also provided + if (condition.crawlerName && !condition.crawlState) { + throw new Error('If you provide a crawler for the condition, you must also provide a crawler state.'); + } + + return { + logicalOperator: condition.logicalOperator ?? ConditionLogicalOperator.EQUALS, + jobName: condition.job?.jobName ?? undefined, + state: condition.state ?? undefined, + crawlerName: condition.crawlerName ?? undefined, + crawlState: condition.crawlState ?? undefined, + }; + }); + + return { + logical: props.predicate.conditions?.length === 1 ? undefined : props.predicate.logical ?? PredicateLogical.AND, + conditions: conditions, + }; + } + + private renderEventBatchingCondition(props: NotifyEventTriggerOptions): CfnTrigger.EventBatchingConditionProperty { + + const defaultBatchSize = 1; + const defaultBatchWindow = cdk.Duration.seconds(900).toSeconds(); + + if (!props.eventBatchingCondition) { + return { + batchSize: defaultBatchSize, + batchWindow: defaultBatchWindow, + }; + } + + return { + batchSize: props.eventBatchingCondition.batchSize || defaultBatchSize, + batchWindow: props.eventBatchingCondition.batchWindow?.toSeconds() || defaultBatchWindow, + }; + } + + protected buildWorkflowArn(scope: constructs.Construct, workflowName: string): string { + return cdk.Stack.of(scope).formatArn({ + service: 'glue', + resource: 'workflow', + resourceName: workflowName, + }); + } +} + +/** + * A class used for defining a Glue Workflow + * + * @resource AWS::Glue::Workflow + */ +export class Workflow extends WorkflowBase { + /** + * Import a workflow from its name + */ + public static fromWorkflowName(scope: constructs.Construct, id: string, workflowName: string): IWorkflow { + return this.fromWorkflowAttributes(scope, id, { + workflowName, + }); + } + + /** + * Import an workflow from it's name + */ + public static fromWorkflowArn(scope: constructs.Construct, id: string, workflowArn: string): IWorkflow { + return this.fromWorkflowAttributes(scope, id, { + workflowName: this.extractNameFromArn(scope, workflowArn), + workflowArn, + }); + } + + /** + * Import an existing workflow + */ + public static fromWorkflowAttributes(scope: constructs.Construct, id: string, attrs: WorkflowAttributes): IWorkflow { + class Import extends WorkflowBase { + public readonly workflowName = attrs.workflowName; + public readonly workflowArn = this.buildWorkflowArn(scope, this.workflowName); + } + + return new Import(scope, id); + } + + public readonly workflowName: string; + public readonly workflowArn: string; + + constructor(scope: constructs.Construct, id: string, props?: WorkflowProps) { + super(scope, id, { + physicalName: props?.workflowName, + }); + + const resource = new CfnWorkflow(this, 'Resource', { + name: this.physicalName, + description: props?.description, + defaultRunProperties: props?.defaultRunProperties, + maxConcurrentRuns: props?.maxConcurrentRuns, + }); + + this.workflowName = this.getResourceNameAttribute(resource.ref); + this.workflowArn = this.buildWorkflowArn(this, this.workflowName); + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/code.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/code.test.ts index 9b213cd891134..1f62dca7977bf 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/code.test.ts +++ b/packages/@aws-cdk/aws-glue-alpha/test/code.test.ts @@ -1,9 +1,10 @@ import * as path from 'path'; -import { Template } from 'aws-cdk-lib/assertions'; +import { Template, Match } from 'aws-cdk-lib/assertions'; import * as s3 from 'aws-cdk-lib/aws-s3'; import * as cdk from 'aws-cdk-lib'; import * as cxapi from 'aws-cdk-lib/cx-api'; import * as glue from '../lib'; +import { Role, ServicePrincipal } from 'aws-cdk-lib/aws-iam'; describe('Code', () => { let stack: cdk.Stack; @@ -21,11 +22,11 @@ describe('Code', () => { test('with valid bucket name and key and bound by job sets the right path and grants the job permissions to read from it', () => { bucket = s3.Bucket.fromBucketName(stack, 'Bucket', 'bucketname'); script = glue.Code.fromBucket(bucket, key); - new glue.Job(stack, 'Job1', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, + + new glue.PythonShellJob(stack, 'Job1', { + script, + role: new Role(stack, 'Role', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), }); @@ -77,7 +78,7 @@ describe('Code', () => { }, Roles: [ { - Ref: 'Job1ServiceRole7AF34CCA', + Ref: Match.stringLikeRegexp('Role'), }, ], }); @@ -93,11 +94,10 @@ describe('Code', () => { }); test("with valid and existing file path and bound to job sets job's script location and permissions stack metadata", () => { - new glue.Job(stack, 'Job1', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, + new glue.PythonShellJob(stack, 'Job1', { + script, + role: new Role(stack, 'Role', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), }); @@ -193,7 +193,7 @@ describe('Code', () => { }, Roles: [ { - Ref: 'Job1ServiceRole7AF34CCA', + Ref: Match.stringLikeRegexp('Role'), }, ], }); @@ -205,18 +205,16 @@ describe('Code', () => { }); test('used in more than 1 job in the same stack should be reused', () => { - new glue.Job(stack, 'Job1', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, + new glue.PythonShellJob(stack, 'Job1', { + script, + role: new Role(stack, 'Role1', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), }); - new glue.Job(stack, 'Job2', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, + new glue.PythonShellJob(stack, 'Job2', { + script, + role: new Role(stack, 'Role2', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), }); const ScriptLocation = { @@ -266,7 +264,7 @@ describe('Code', () => { }, Role: { 'Fn::GetAtt': [ - 'Job1ServiceRole7AF34CCA', + Match.stringLikeRegexp('Role'), 'Arn', ], }, @@ -277,7 +275,7 @@ describe('Code', () => { }, Role: { 'Fn::GetAtt': [ - 'Job2ServiceRole5D2B98FE', + Match.stringLikeRegexp('Role'), 'Arn', ], }, @@ -285,20 +283,18 @@ describe('Code', () => { }); test('throws if trying to rebind in another stack', () => { - new glue.Job(stack, 'Job1', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, + new glue.PythonShellJob(stack, 'Job1', { + script, + role: new Role(stack, 'Role1', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), }); const differentStack = new cdk.Stack(); - expect(() => new glue.Job(differentStack, 'Job2', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script: script, + expect(() => new glue.PythonShellJob(differentStack, 'Job1', { + script, + role: new Role(stack, 'Role2', { + assumedBy: new ServicePrincipal('glue.amazonaws.com'), }), })).toThrow(/associated with another stack/); }); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/constants.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/constants.test.ts new file mode 100644 index 0000000000000..2b1a68680f82d --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/constants.test.ts @@ -0,0 +1,95 @@ +import * as glue from '../lib'; + +describe('WorkerType', () => { + test('.STANDARD should set the name correctly', () => expect(glue.WorkerType.STANDARD).toEqual('Standard')); + + test('.G_1X should set the name correctly', () => expect(glue.WorkerType.G_1X).toEqual('G.1X')); + + test('.G_2X should set the name correctly', () => expect(glue.WorkerType.G_2X).toEqual('G.2X')); + + test('.G_4X should set the name correctly', () => expect(glue.WorkerType.G_4X).toEqual('G.4X')); + + test('.G_8X should set the name correctly', () => expect(glue.WorkerType.G_8X).toEqual('G.8X')); + + test('.G_025X should set the name correctly', () => expect(glue.WorkerType.G_025X).toEqual('G.025X')); + + test('.Z_2X should set the name correctly', () => expect(glue.WorkerType.Z_2X).toEqual('Z.2X')); + +}); + +describe('JobState', () => { + test('SUCCEEDED should set Job State correctly', () => expect(glue.JobState.SUCCEEDED).toEqual('SUCCEEDED')); + + test('FAILED should set Job State correctly', () => expect(glue.JobState.FAILED).toEqual('FAILED')); + + test('RUNNING should set Job State correctly', () => expect(glue.JobState.RUNNING).toEqual('RUNNING')); + + test('STARTING should set Job State correctly', () => expect(glue.JobState.STARTING).toEqual('STARTING')); + + test('STOPPED should set Job State correctly', () => expect(glue.JobState.STOPPED).toEqual('STOPPED')); + + test('STOPPING should set Job State correctly', () => expect(glue.JobState.STOPPING).toEqual('STOPPING')); + + test('TIMEOUT should set Job State correctly', () => expect(glue.JobState.TIMEOUT).toEqual('TIMEOUT')); + +}); + +describe('Metric Type', () => { + test('GAUGE should set Metric Type correctly', () => expect(glue.MetricType.GAUGE).toEqual('gauge')); + + test('COUNT should set Metric Type correctly', () => expect(glue.MetricType.COUNT).toEqual('count')); + +}); + +describe('Execution Class', () => { + test('FLEX should set Execution Class correctly', () => expect(glue.ExecutionClass.FLEX).toEqual('FLEX')); + + test('STANDARD should set Execution Class correctly', () => expect(glue.ExecutionClass.STANDARD).toEqual('STANDARD')); + +}); + +describe('Glue Version', () => { + test('V0_9 should set Glue Version correctly', () => expect(glue.GlueVersion.V0_9).toEqual('0.9')); + + test('V1_0 should set Glue Version correctly', () => expect(glue.GlueVersion.V1_0).toEqual('1.0')); + + test('V2_0 should set Glue Version correctly', () => expect(glue.GlueVersion.V2_0).toEqual('2.0')); + + test('V3_0 should set Glue Version correctly', () => expect(glue.GlueVersion.V3_0).toEqual('3.0')); + + test('V4_0 should set Glue Version correctly', () => expect(glue.GlueVersion.V4_0).toEqual('4.0')); + +}); + +describe('Job Language', () => { + test('PYTHON should set Job Language correctly', () => expect(glue.JobLanguage.PYTHON).toEqual('python')); + + test('SCALA should set Job Language correctly', () => expect(glue.JobLanguage.SCALA).toEqual('scala')); + +}); + +describe('Python Version', () => { + test('TWO should set Python Version correctly', () => expect(glue.PythonVersion.TWO).toEqual('2')); + + test('THREE should set Python Version correctly', () => expect(glue.PythonVersion.THREE).toEqual('3')); + + test('THREE_NINE should set Python Version correctly', () => expect(glue.PythonVersion.THREE_NINE).toEqual('3.9')); + +}); + +describe('Runtime', () => { + test('RAY_TWO_FOUR should set Runtime correctly', () => expect(glue.Runtime.RAY_TWO_FOUR).toEqual('Ray2.4')); + +}); + +describe('JobType', () => { + test('ETL should set Runtime correctly', () => expect(glue.JobType.ETL).toEqual('glueetl')); + + test('PYTHON_SHELL should set Runtime correctly', () => expect(glue.JobType.PYTHON_SHELL).toEqual('pythonshell')); + + test('RAY should set Runtime correctly', () => expect(glue.JobType.RAY).toEqual('glueray')); + + test('STREAMING should set Runtime correctly', () => expect(glue.JobType.STREAMING).toEqual('gluestreaming')); + +}); + diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py new file mode 100644 index 0000000000000..e75154b7c390f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py @@ -0,0 +1 @@ +print("hello world") \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.assets.json new file mode 100644 index 0000000000000..653b948c07787 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { + "source": { + "path": "asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "4799b81562fc3fe83d1f986c3c439f80d36cbfc9421a3e8558060ffaf8616aa0": { + "source": { + "path": "aws-glue-job-pyspark-etl.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "4799b81562fc3fe83d1f986c3c439f80d36cbfc9421a3e8558060ffaf8616aa0.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.template.json new file mode 100644 index 0000000000000..dc70fabbb2dd8 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/aws-glue-job-pyspark-etl.template.json @@ -0,0 +1,206 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicPySparkETLJob833DD8C4": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3.9", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverridePySparkETLJob85E17065": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3.9", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override PySpark ETL Job", + "GlueVersion": "3.0", + "Name": "Optional Override PySpark ETL Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets.json new file mode 100644 index 0000000000000..aca979ed2ea11 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/integ.json new file mode 100644 index 0000000000000..bc938a726ac44 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-pyspark-etl-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-pyspark-etl" + ], + "assertionStack": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/manifest.json new file mode 100644 index 0000000000000..d4ea673146612 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-pyspark-etl.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-pyspark-etl.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-pyspark-etl": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-pyspark-etl.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/4799b81562fc3fe83d1f986c3c439f80d36cbfc9421a3e8558060ffaf8616aa0.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-pyspark-etl.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-pyspark-etl.assets" + ], + "metadata": { + "/aws-glue-job-pyspark-etl/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-pyspark-etl/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-pyspark-etl/BasicPySparkETLJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicPySparkETLJob833DD8C4" + } + ], + "/aws-glue-job-pyspark-etl/OverridePySparkETLJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverridePySparkETLJob85E17065" + } + ], + "/aws-glue-job-pyspark-etl/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pyspark-etl/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pyspark-etl" + }, + "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobpysparketlintegtestDefaultTestDeployAssertED1ACE14.assets" + ], + "metadata": { + "/aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/tree.json new file mode 100644 index 0000000000000..974404f968b15 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.js.snapshot/tree.json @@ -0,0 +1,375 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-pyspark-etl": { + "id": "aws-glue-job-pyspark-etl", + "path": "aws-glue-job-pyspark-etl", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-pyspark-etl/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-pyspark-etl/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-etl/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-pyspark-etl/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-etl/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicPySparkETLJob": { + "id": "BasicPySparkETLJob", + "path": "aws-glue-job-pyspark-etl/BasicPySparkETLJob", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "aws-glue-job-pyspark-etl/BasicPySparkETLJob/Code2907ea7be4a583708cfffc21b3df1dfa", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-pyspark-etl/BasicPySparkETLJob/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-pyspark-etl/BasicPySparkETLJob/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-etl/BasicPySparkETLJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkEtlJob", + "version": "0.0.0" + } + }, + "OverridePySparkETLJob": { + "id": "OverridePySparkETLJob", + "path": "aws-glue-job-pyspark-etl/OverridePySparkETLJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-etl/OverridePySparkETLJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override PySpark ETL Job", + "glueVersion": "3.0", + "name": "Optional Override PySpark ETL Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkEtlJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pyspark-etl/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pyspark-etl/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-pyspark-etl-integ-test": { + "id": "aws-glue-job-pyspark-etl-integ-test", + "path": "aws-glue-job-pyspark-etl-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-pyspark-etl-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pyspark-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.ts new file mode 100644 index 0000000000000..060d81054a0f0 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-etl.ts @@ -0,0 +1,61 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-pyspark-etl'); + +const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.PySparkEtlJob(stack, 'BasicPySparkETLJob', { + script: script, + role: iam_role, +}); + +new glue.PySparkEtlJob(stack, 'OverridePySparkETLJob', { + script: script, + role: iam_role, + description: 'Optional Override PySpark ETL Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override PySpark ETL Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, + jobRunQueuingEnabled: true, +}); + +new integ.IntegTest(app, 'aws-glue-job-pyspark-etl-integ-test', { + testCases: [stack], +}); + +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py new file mode 100644 index 0000000000000..e75154b7c390f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py @@ -0,0 +1 @@ +print("hello world") \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.assets.json new file mode 100644 index 0000000000000..15de0107fd134 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { + "source": { + "path": "asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "193dcee820d44a5de2c48d3e455195e1b19d1d4b1dea979dbacb4d90ecee8aec": { + "source": { + "path": "aws-glue-job-pysparkflex-etl.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "193dcee820d44a5de2c48d3e455195e1b19d1d4b1dea979dbacb4d90ecee8aec.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.template.json new file mode 100644 index 0000000000000..7b4cd1bbfdb80 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/aws-glue-job-pysparkflex-etl.template.json @@ -0,0 +1,208 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicPySparkFlexEtlJobC50DC250": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverridePySparkFlexEtlJob8EE4CFA1": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override PySpark Flex Etl Job", + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "Name": "Optional Override PySpark Flex Etl Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets.json new file mode 100644 index 0000000000000..d77fab393274a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/integ.json new file mode 100644 index 0000000000000..b837700f2ba0b --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-pysparkflex-etl" + ], + "assertionStack": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/manifest.json new file mode 100644 index 0000000000000..197580f722ebb --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-pysparkflex-etl.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-pysparkflex-etl.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-pysparkflex-etl": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-pysparkflex-etl.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/193dcee820d44a5de2c48d3e455195e1b19d1d4b1dea979dbacb4d90ecee8aec.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-pysparkflex-etl.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-pysparkflex-etl.assets" + ], + "metadata": { + "/aws-glue-job-pysparkflex-etl/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-pysparkflex-etl/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicPySparkFlexEtlJobC50DC250" + } + ], + "/aws-glue-job-pysparkflex-etl/OverridePySparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverridePySparkFlexEtlJob8EE4CFA1" + } + ], + "/aws-glue-job-pysparkflex-etl/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pysparkflex-etl/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pysparkflex-etl" + }, + "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobpysparkflexetlintegtestDefaultTestDeployAssert3F3EC951.assets" + ], + "metadata": { + "/aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/tree.json new file mode 100644 index 0000000000000..0ae8ba0cdb4a2 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.js.snapshot/tree.json @@ -0,0 +1,377 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-pysparkflex-etl": { + "id": "aws-glue-job-pysparkflex-etl", + "path": "aws-glue-job-pysparkflex-etl", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-pysparkflex-etl/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-pysparkflex-etl/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pysparkflex-etl/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-pysparkflex-etl/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pysparkflex-etl/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicPySparkFlexEtlJob": { + "id": "BasicPySparkFlexEtlJob", + "path": "aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob/Code2907ea7be4a583708cfffc21b3df1dfa", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pysparkflex-etl/BasicPySparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "executionClass": "FLEX", + "glueVersion": "3.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkFlexEtlJob", + "version": "0.0.0" + } + }, + "OverridePySparkFlexEtlJob": { + "id": "OverridePySparkFlexEtlJob", + "path": "aws-glue-job-pysparkflex-etl/OverridePySparkFlexEtlJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pysparkflex-etl/OverridePySparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override PySpark Flex Etl Job", + "executionClass": "FLEX", + "glueVersion": "3.0", + "name": "Optional Override PySpark Flex Etl Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkFlexEtlJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pysparkflex-etl/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pysparkflex-etl/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-pysparkflex-etl-integ-test": { + "id": "aws-glue-job-pysparkflex-etl-integ-test", + "path": "aws-glue-job-pysparkflex-etl-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pysparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.ts new file mode 100644 index 0000000000000..d53bb703c0123 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-flex-etl.ts @@ -0,0 +1,66 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-pysparkflex-etl'); + +const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.PySparkFlexEtlJob(stack, 'BasicPySparkFlexEtlJob', { + script: script, + role: iam_role, +}); + +/*new glue.PySparkFlexEtlJob(stack, 'BasicPySparkFlexEtlJobv3', { + script: script, + role: iam_role, + glueVersion: glue.GlueVersion.V3_0, +}); */ + +new glue.PySparkFlexEtlJob(stack, 'OverridePySparkFlexEtlJob', { + script: script, + role: iam_role, + description: 'Optional Override PySpark Flex Etl Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override PySpark Flex Etl Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, +}); + +new integ.IntegTest(app, 'aws-glue-job-pysparkflex-etl-integ-test', { + testCases: [stack], +}); + +app.synth(); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py new file mode 100644 index 0000000000000..e75154b7c390f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py @@ -0,0 +1 @@ +print("hello world") \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.assets.json new file mode 100644 index 0000000000000..241c8c0ce1b5e --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { + "source": { + "path": "asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "f4cee6cf3c3f4fb0c83791808642b0391d7a1bd7c1aaa0fe0a8da2168bc0dd85": { + "source": { + "path": "aws-glue-job-pyspark-streaming.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "f4cee6cf3c3f4fb0c83791808642b0391d7a1bd7c1aaa0fe0a8da2168bc0dd85.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.template.json new file mode 100644 index 0000000000000..b73eab962841f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/aws-glue-job-pyspark-streaming.template.json @@ -0,0 +1,206 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicPySparkStreamingJobAFD3B477": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "gluestreaming", + "PythonVersion": "3", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverridePySparkStreamingJob58DE176A": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "gluestreaming", + "PythonVersion": "3", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override PySpark Streaming Job", + "GlueVersion": "3.0", + "Name": "Optional Override PySpark Streaming Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets.json new file mode 100644 index 0000000000000..476e000da5f03 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/integ.json new file mode 100644 index 0000000000000..e6bee2f2422a3 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-pyspark-streaming-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-pyspark-streaming" + ], + "assertionStack": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/manifest.json new file mode 100644 index 0000000000000..70cb8893036f3 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-pyspark-streaming.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-pyspark-streaming.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-pyspark-streaming": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-pyspark-streaming.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/f4cee6cf3c3f4fb0c83791808642b0391d7a1bd7c1aaa0fe0a8da2168bc0dd85.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-pyspark-streaming.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-pyspark-streaming.assets" + ], + "metadata": { + "/aws-glue-job-pyspark-streaming/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-pyspark-streaming/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicPySparkStreamingJobAFD3B477" + } + ], + "/aws-glue-job-pyspark-streaming/OverridePySparkStreamingJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverridePySparkStreamingJob58DE176A" + } + ], + "/aws-glue-job-pyspark-streaming/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pyspark-streaming/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pyspark-streaming" + }, + "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobpysparkstreamingintegtestDefaultTestDeployAssert242E520E.assets" + ], + "metadata": { + "/aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/tree.json new file mode 100644 index 0000000000000..05cbf25732dde --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.js.snapshot/tree.json @@ -0,0 +1,375 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-pyspark-streaming": { + "id": "aws-glue-job-pyspark-streaming", + "path": "aws-glue-job-pyspark-streaming", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-pyspark-streaming/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-pyspark-streaming/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-streaming/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-pyspark-streaming/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-streaming/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicPySparkStreamingJob": { + "id": "BasicPySparkStreamingJob", + "path": "aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob/Code2907ea7be4a583708cfffc21b3df1dfa", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-streaming/BasicPySparkStreamingJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "gluestreaming", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkStreamingJob", + "version": "0.0.0" + } + }, + "OverridePySparkStreamingJob": { + "id": "OverridePySparkStreamingJob", + "path": "aws-glue-job-pyspark-streaming/OverridePySparkStreamingJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-pyspark-streaming/OverridePySparkStreamingJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "gluestreaming", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override PySpark Streaming Job", + "glueVersion": "3.0", + "name": "Optional Override PySpark Streaming Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkStreamingJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pyspark-streaming/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pyspark-streaming/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-pyspark-streaming-integ-test": { + "id": "aws-glue-job-pyspark-streaming-integ-test", + "path": "aws-glue-job-pyspark-streaming-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-pyspark-streaming-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.ts new file mode 100644 index 0000000000000..21546192e3d09 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-pyspark-streaming.ts @@ -0,0 +1,61 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-pyspark-streaming'); + +const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.PySparkStreamingJob(stack, 'BasicPySparkStreamingJob', { + script: script, + role: iam_role, +}); + +new glue.PySparkStreamingJob(stack, 'OverridePySparkStreamingJob', { + script: script, + role: iam_role, + description: 'Optional Override PySpark Streaming Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override PySpark Streaming Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, + jobRunQueuingEnabled: true, +}); + +new integ.IntegTest(app, 'aws-glue-job-pyspark-streaming-integ-test', { + testCases: [stack], +}); + +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.assets.json index 17b109b19285f..522babd056beb 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.assets.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.assets.json @@ -1,5 +1,5 @@ { - "version": "33.0.0", + "version": "36.0.0", "files": { "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { "source": { @@ -14,7 +14,7 @@ } } }, - "13432a74ca6cfada399f4d2b33385964f66c49aeeb01c5f0cefec52560a4dffa": { + "c75d6d44cca641f11b82111a563ba198269fa0483d583cbffd578d0301e9edaf": { "source": { "path": "aws-glue-job-python-shell.template.json", "packaging": "file" @@ -22,7 +22,7 @@ "destinations": { "current_account-current_region": { "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", - "objectKey": "13432a74ca6cfada399f4d2b33385964f66c49aeeb01c5f0cefec52560a4dffa.json", + "objectKey": "c75d6d44cca641f11b82111a563ba198269fa0483d583cbffd578d0301e9edaf.json", "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" } } diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.template.json index dece180ae8219..d98d7d4485e3b 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.template.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/aws-glue-job-python-shell.template.json @@ -1,6 +1,6 @@ { "Resources": { - "ShellJobServiceRoleCF97BC4B": { + "IAMServiceRole61C662C4": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { @@ -31,7 +31,7 @@ ] } }, - "ShellJobServiceRoleDefaultPolicy7F22D315": { + "IAMServiceRoleDefaultPolicy379D1A0E": { "Type": "AWS::IAM::Policy", "Properties": { "PolicyDocument": { @@ -80,20 +80,20 @@ ], "Version": "2012-10-17" }, - "PolicyName": "ShellJobServiceRoleDefaultPolicy7F22D315", + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", "Roles": [ { - "Ref": "ShellJobServiceRoleCF97BC4B" + "Ref": "IAMServiceRole61C662C4" } ] } }, - "ShellJob42E81F95": { + "BasicShellJob39F2E7D12A": { "Type": "AWS::Glue::Job", "Properties": { "Command": { "Name": "pythonshell", - "PythonVersion": "3", + "PythonVersion": "3.9", "ScriptLocation": { "Fn::Join": [ "", @@ -109,112 +109,59 @@ }, "DefaultArguments": { "--job-language": "python", - "arg1": "value1", - "arg2": "value2" + "library-set": "analytics", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" }, - "GlueVersion": "1.0", + "GlueVersion": "3.0", "MaxCapacity": 0.0625, - "Name": "ShellJob", + "MaxRetries": 0, "Role": { "Fn::GetAtt": [ - "ShellJobServiceRoleCF97BC4B", + "IAMServiceRole61C662C4", "Arn" ] - }, - "Tags": { - "key": "value" } } }, - "ShellJob39ServiceRole2F6F3768": { - "Type": "AWS::IAM::Role", + "BasicShellJobC7D0761E": { + "Type": "AWS::Glue::Job", "Properties": { - "AssumeRolePolicyDocument": { - "Statement": [ - { - "Action": "sts:AssumeRole", - "Effect": "Allow", - "Principal": { - "Service": "glue.amazonaws.com" - } - } - ], - "Version": "2012-10-17" - }, - "ManagedPolicyArns": [ - { + "Command": { + "Name": "pythonshell", + "PythonVersion": "3", + "ScriptLocation": { "Fn::Join": [ "", [ - "arn:", + "s3://", { - "Ref": "AWS::Partition" + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" }, - ":iam::aws:policy/service-role/AWSGlueServiceRole" + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" ] ] } - ] - } - }, - "ShellJob39ServiceRoleDefaultPolicy38A33919": { - "Type": "AWS::IAM::Policy", - "Properties": { - "PolicyDocument": { - "Statement": [ - { - "Action": [ - "s3:GetBucket*", - "s3:GetObject*", - "s3:List*" - ], - "Effect": "Allow", - "Resource": [ - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - }, - "/*" - ] - ] - }, - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - } - ] - ] - } - ] - } - ], - "Version": "2012-10-17" }, - "PolicyName": "ShellJob39ServiceRoleDefaultPolicy38A33919", - "Roles": [ - { - "Ref": "ShellJob39ServiceRole2F6F3768" - } - ] + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "1.0", + "MaxCapacity": 0.0625, + "MaxRetries": 0, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + } } }, - "ShellJob390C141361": { + "DetailedShellJob39CB370B41": { "Type": "AWS::Glue::Job", "Properties": { "Command": { @@ -235,15 +182,21 @@ }, "DefaultArguments": { "--job-language": "python", + "library-set": "analytics", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", "arg1": "value1", "arg2": "value2" }, + "Description": "My detailed Python 3.9 Shell Job", "GlueVersion": "3.0", "MaxCapacity": 1, - "Name": "ShellJob39", + "MaxRetries": 0, + "Name": "My Python 3.9 Shell Job", "Role": { "Fn::GetAtt": [ - "ShellJob39ServiceRole2F6F3768", + "IAMServiceRole61C662C4", "Arn" ] }, diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/awsgluejobpythonshellintegtestDefaultTestDeployAssert453D25B7.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/awsgluejobpythonshellintegtestDefaultTestDeployAssert453D25B7.assets.json index fcf891c433efb..fc44607a05dee 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/awsgluejobpythonshellintegtestDefaultTestDeployAssert453D25B7.assets.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/awsgluejobpythonshellintegtestDefaultTestDeployAssert453D25B7.assets.json @@ -1,5 +1,5 @@ { - "version": "33.0.0", + "version": "36.0.0", "files": { "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { "source": { diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/cdk.out index 560dae10d018f..1f0068d32659a 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/cdk.out +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/cdk.out @@ -1 +1 @@ -{"version":"33.0.0"} \ No newline at end of file +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/integ.json index 89660486d806d..30e0cedc0c82d 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/integ.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/integ.json @@ -1,5 +1,5 @@ { - "version": "33.0.0", + "version": "36.0.0", "testCases": { "aws-glue-job-python-shell-integ-test/DefaultTest": { "stacks": [ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/manifest.json index 93cd15ece1b08..026dce44eec16 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/manifest.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/manifest.json @@ -1,5 +1,5 @@ { - "version": "33.0.0", + "version": "36.0.0", "artifacts": { "aws-glue-job-python-shell.assets": { "type": "cdk:asset-manifest", @@ -14,10 +14,11 @@ "environment": "aws://unknown-account/unknown-region", "properties": { "templateFile": "aws-glue-job-python-shell.template.json", + "terminationProtection": false, "validateOnSynth": false, "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", - "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/13432a74ca6cfada399f4d2b33385964f66c49aeeb01c5f0cefec52560a4dffa.json", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/c75d6d44cca641f11b82111a563ba198269fa0483d583cbffd578d0301e9edaf.json", "requiresBootstrapStackVersion": 6, "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", "additionalDependencies": [ @@ -33,40 +34,34 @@ "aws-glue-job-python-shell.assets" ], "metadata": { - "/aws-glue-job-python-shell/ShellJob/ServiceRole/Resource": [ + "/aws-glue-job-python-shell/IAMServiceRole/Resource": [ { "type": "aws:cdk:logicalId", - "data": "ShellJobServiceRoleCF97BC4B" + "data": "IAMServiceRole61C662C4" } ], - "/aws-glue-job-python-shell/ShellJob/ServiceRole/DefaultPolicy/Resource": [ + "/aws-glue-job-python-shell/IAMServiceRole/DefaultPolicy/Resource": [ { "type": "aws:cdk:logicalId", - "data": "ShellJobServiceRoleDefaultPolicy7F22D315" + "data": "IAMServiceRoleDefaultPolicy379D1A0E" } ], - "/aws-glue-job-python-shell/ShellJob/Resource": [ + "/aws-glue-job-python-shell/BasicShellJob39/Resource": [ { "type": "aws:cdk:logicalId", - "data": "ShellJob42E81F95" + "data": "BasicShellJob39F2E7D12A" } ], - "/aws-glue-job-python-shell/ShellJob39/ServiceRole/Resource": [ + "/aws-glue-job-python-shell/BasicShellJob/Resource": [ { "type": "aws:cdk:logicalId", - "data": "ShellJob39ServiceRole2F6F3768" + "data": "BasicShellJobC7D0761E" } ], - "/aws-glue-job-python-shell/ShellJob39/ServiceRole/DefaultPolicy/Resource": [ + "/aws-glue-job-python-shell/DetailedShellJob39/Resource": [ { "type": "aws:cdk:logicalId", - "data": "ShellJob39ServiceRoleDefaultPolicy38A33919" - } - ], - "/aws-glue-job-python-shell/ShellJob39/Resource": [ - { - "type": "aws:cdk:logicalId", - "data": "ShellJob390C141361" + "data": "DetailedShellJob39CB370B41" } ], "/aws-glue-job-python-shell/BootstrapVersion": [ @@ -80,6 +75,60 @@ "type": "aws:cdk:logicalId", "data": "CheckBootstrapVersion" } + ], + "ShellJobServiceRoleCF97BC4B": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJobServiceRoleCF97BC4B", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } + ], + "ShellJobServiceRoleDefaultPolicy7F22D315": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJobServiceRoleDefaultPolicy7F22D315", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } + ], + "ShellJob42E81F95": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJob42E81F95", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } + ], + "ShellJob39ServiceRole2F6F3768": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJob39ServiceRole2F6F3768", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } + ], + "ShellJob39ServiceRoleDefaultPolicy38A33919": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJob39ServiceRoleDefaultPolicy38A33919", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } + ], + "ShellJob390C141361": [ + { + "type": "aws:cdk:logicalId", + "data": "ShellJob390C141361", + "trace": [ + "!!DESTRUCTIVE_CHANGES: WILL_DESTROY" + ] + } ] }, "displayName": "aws-glue-job-python-shell" @@ -97,6 +146,7 @@ "environment": "aws://unknown-account/unknown-region", "properties": { "templateFile": "awsgluejobpythonshellintegtestDefaultTestDeployAssert453D25B7.template.json", + "terminationProtection": false, "validateOnSynth": false, "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/tree.json index 05905851160a8..4c23b18fce55d 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/tree.json +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.js.snapshot/tree.json @@ -8,149 +8,149 @@ "id": "aws-glue-job-python-shell", "path": "aws-glue-job-python-shell", "children": { - "ShellJob": { - "id": "ShellJob", - "path": "aws-glue-job-python-shell/ShellJob", + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-python-shell/IAMServiceRole", "children": { - "ServiceRole": { - "id": "ServiceRole", - "path": "aws-glue-job-python-shell/ShellJob/ServiceRole", + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-python-shell/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-python-shell/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-python-shell/IAMServiceRole/DefaultPolicy", "children": { - "ImportServiceRole": { - "id": "ImportServiceRole", - "path": "aws-glue-job-python-shell/ShellJob/ServiceRole/ImportServiceRole", - "constructInfo": { - "fqn": "aws-cdk-lib.Resource", - "version": "0.0.0" - } - }, "Resource": { "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob/ServiceRole/Resource", + "path": "aws-glue-job-python-shell/IAMServiceRole/DefaultPolicy/Resource", "attributes": { - "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", "aws:cdk:cloudformation:props": { - "assumeRolePolicyDocument": { + "policyDocument": { "Statement": [ { - "Action": "sts:AssumeRole", + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], "Effect": "Allow", - "Principal": { - "Service": "glue.amazonaws.com" - } + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] } ], "Version": "2012-10-17" }, - "managedPolicyArns": [ + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":iam::aws:policy/service-role/AWSGlueServiceRole" - ] - ] + "Ref": "IAMServiceRole61C662C4" } ] } }, "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.CfnRole", - "version": "0.0.0" - } - }, - "DefaultPolicy": { - "id": "DefaultPolicy", - "path": "aws-glue-job-python-shell/ShellJob/ServiceRole/DefaultPolicy", - "children": { - "Resource": { - "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob/ServiceRole/DefaultPolicy/Resource", - "attributes": { - "aws:cdk:cloudformation:type": "AWS::IAM::Policy", - "aws:cdk:cloudformation:props": { - "policyDocument": { - "Statement": [ - { - "Action": [ - "s3:GetBucket*", - "s3:GetObject*", - "s3:List*" - ], - "Effect": "Allow", - "Resource": [ - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - }, - "/*" - ] - ] - }, - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - } - ] - ] - } - ] - } - ], - "Version": "2012-10-17" - }, - "policyName": "ShellJobServiceRoleDefaultPolicy7F22D315", - "roles": [ - { - "Ref": "ShellJobServiceRoleCF97BC4B" - } - ] - } - }, - "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", - "version": "0.0.0" - } - } - }, - "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.Policy", + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", "version": "0.0.0" } } }, "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.Role", + "fqn": "aws-cdk-lib.aws_iam.Policy", "version": "0.0.0" } - }, - "Code8835353412338ec0bac0ee05542d1c16": { - "id": "Code8835353412338ec0bac0ee05542d1c16", - "path": "aws-glue-job-python-shell/ShellJob/Code8835353412338ec0bac0ee05542d1c16", + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicShellJob39": { + "id": "BasicShellJob39", + "path": "aws-glue-job-python-shell/BasicShellJob39", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "aws-glue-job-python-shell/BasicShellJob39/Code2907ea7be4a583708cfffc21b3df1dfa", "children": { "Stage": { "id": "Stage", - "path": "aws-glue-job-python-shell/ShellJob/Code8835353412338ec0bac0ee05542d1c16/Stage", + "path": "aws-glue-job-python-shell/BasicShellJob39/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", "constructInfo": { "fqn": "aws-cdk-lib.AssetStaging", "version": "0.0.0" @@ -158,7 +158,7 @@ }, "AssetBucket": { "id": "AssetBucket", - "path": "aws-glue-job-python-shell/ShellJob/Code8835353412338ec0bac0ee05542d1c16/AssetBucket", + "path": "aws-glue-job-python-shell/BasicShellJob39/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", "constructInfo": { "fqn": "aws-cdk-lib.aws_s3.BucketBase", "version": "0.0.0" @@ -172,7 +172,7 @@ }, "Resource": { "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob/Resource", + "path": "aws-glue-job-python-shell/BasicShellJob39/Resource", "attributes": { "aws:cdk:cloudformation:type": "AWS::Glue::Job", "aws:cdk:cloudformation:props": { @@ -190,24 +190,23 @@ ] ] }, - "pythonVersion": "3" + "pythonVersion": "3.9" }, "defaultArguments": { "--job-language": "python", - "arg1": "value1", - "arg2": "value2" + "library-set": "analytics", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" }, - "glueVersion": "1.0", + "glueVersion": "3.0", "maxCapacity": 0.0625, - "name": "ShellJob", + "maxRetries": 0, "role": { "Fn::GetAtt": [ - "ShellJobServiceRoleCF97BC4B", + "IAMServiceRole61C662C4", "Arn" ] - }, - "tags": { - "key": "value" } } }, @@ -218,149 +217,71 @@ } }, "constructInfo": { - "fqn": "@aws-cdk/aws-glue-alpha.Job", + "fqn": "@aws-cdk/aws-glue-alpha.PythonShellJob", "version": "0.0.0" } }, - "ShellJob39": { - "id": "ShellJob39", - "path": "aws-glue-job-python-shell/ShellJob39", + "BasicShellJob": { + "id": "BasicShellJob", + "path": "aws-glue-job-python-shell/BasicShellJob", "children": { - "ServiceRole": { - "id": "ServiceRole", - "path": "aws-glue-job-python-shell/ShellJob39/ServiceRole", - "children": { - "ImportServiceRole": { - "id": "ImportServiceRole", - "path": "aws-glue-job-python-shell/ShellJob39/ServiceRole/ImportServiceRole", - "constructInfo": { - "fqn": "aws-cdk-lib.Resource", - "version": "0.0.0" - } - }, - "Resource": { - "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob39/ServiceRole/Resource", - "attributes": { - "aws:cdk:cloudformation:type": "AWS::IAM::Role", - "aws:cdk:cloudformation:props": { - "assumeRolePolicyDocument": { - "Statement": [ + "Resource": { + "id": "Resource", + "path": "aws-glue-job-python-shell/BasicShellJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "pythonshell", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", { - "Action": "sts:AssumeRole", - "Effect": "Allow", - "Principal": { - "Service": "glue.amazonaws.com" - } - } - ], - "Version": "2012-10-17" - }, - "managedPolicyArns": [ - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":iam::aws:policy/service-role/AWSGlueServiceRole" - ] - ] - } + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] ] - } + }, + "pythonVersion": "3" }, - "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.CfnRole", - "version": "0.0.0" - } - }, - "DefaultPolicy": { - "id": "DefaultPolicy", - "path": "aws-glue-job-python-shell/ShellJob39/ServiceRole/DefaultPolicy", - "children": { - "Resource": { - "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob39/ServiceRole/DefaultPolicy/Resource", - "attributes": { - "aws:cdk:cloudformation:type": "AWS::IAM::Policy", - "aws:cdk:cloudformation:props": { - "policyDocument": { - "Statement": [ - { - "Action": [ - "s3:GetBucket*", - "s3:GetObject*", - "s3:List*" - ], - "Effect": "Allow", - "Resource": [ - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - }, - "/*" - ] - ] - }, - { - "Fn::Join": [ - "", - [ - "arn:", - { - "Ref": "AWS::Partition" - }, - ":s3:::", - { - "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" - } - ] - ] - } - ] - } - ], - "Version": "2012-10-17" - }, - "policyName": "ShellJob39ServiceRoleDefaultPolicy38A33919", - "roles": [ - { - "Ref": "ShellJob39ServiceRole2F6F3768" - } - ] - } - }, - "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", - "version": "0.0.0" - } - } + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" }, - "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.Policy", - "version": "0.0.0" + "glueVersion": "1.0", + "maxCapacity": 0.0625, + "maxRetries": 0, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] } } }, "constructInfo": { - "fqn": "aws-cdk-lib.aws_iam.Role", + "fqn": "aws-cdk-lib.aws_glue.CfnJob", "version": "0.0.0" } - }, + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PythonShellJob", + "version": "0.0.0" + } + }, + "DetailedShellJob39": { + "id": "DetailedShellJob39", + "path": "aws-glue-job-python-shell/DetailedShellJob39", + "children": { "Resource": { "id": "Resource", - "path": "aws-glue-job-python-shell/ShellJob39/Resource", + "path": "aws-glue-job-python-shell/DetailedShellJob39/Resource", "attributes": { "aws:cdk:cloudformation:type": "AWS::Glue::Job", "aws:cdk:cloudformation:props": { @@ -382,15 +303,21 @@ }, "defaultArguments": { "--job-language": "python", + "library-set": "analytics", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", "arg1": "value1", "arg2": "value2" }, + "description": "My detailed Python 3.9 Shell Job", "glueVersion": "3.0", "maxCapacity": 1, - "name": "ShellJob39", + "maxRetries": 0, + "name": "My Python 3.9 Shell Job", "role": { "Fn::GetAtt": [ - "ShellJob39ServiceRole2F6F3768", + "IAMServiceRole61C662C4", "Arn" ] }, @@ -406,7 +333,7 @@ } }, "constructInfo": { - "fqn": "@aws-cdk/aws-glue-alpha.Job", + "fqn": "@aws-cdk/aws-glue-alpha.PythonShellJob", "version": "0.0.0" } }, @@ -445,7 +372,7 @@ "path": "aws-glue-job-python-shell-integ-test/DefaultTest/Default", "constructInfo": { "fqn": "constructs.Construct", - "version": "10.2.69" + "version": "10.3.0" } }, "DeployAssert": { @@ -491,7 +418,7 @@ "path": "Tree", "constructInfo": { "fqn": "constructs.Construct", - "version": "10.2.69" + "version": "10.3.0" } } }, diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.ts index a08a19713b9a7..b384b41b3032b 100644 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.ts +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-python-shell.ts @@ -2,6 +2,7 @@ import * as integ from '@aws-cdk/integ-tests-alpha'; import * as path from 'path'; import * as cdk from 'aws-cdk-lib'; import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; /** * To verify the ability to run jobs created in this test @@ -24,30 +25,29 @@ const stack = new cdk.Stack(app, 'aws-glue-job-python-shell'); const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); -new glue.Job(stack, 'ShellJob', { - jobName: 'ShellJob', - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - tags: { - key: 'value', - }, - maxCapacity: 0.0625, +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.PythonShellJob(stack, 'BasicShellJob39', { + script: script, + role: iam_role, +}); + +new glue.PythonShellJob(stack, 'BasicShellJob', { + script: script, + role: iam_role, + pythonVersion: glue.PythonVersion.THREE, + glueVersion: glue.GlueVersion.V1_0, }); -new glue.Job(stack, 'ShellJob39', { - jobName: 'ShellJob39', - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - }), +new glue.PythonShellJob(stack, 'DetailedShellJob39', { + script: script, + role: iam_role, + description: 'My detailed Python 3.9 Shell Job', + maxCapacity: glue.MaxCapacity.DPU_1, + jobName: 'My Python 3.9 Shell Job', defaultArguments: { arg1: 'value1', arg2: 'value2', @@ -55,11 +55,11 @@ new glue.Job(stack, 'ShellJob39', { tags: { key: 'value', }, - maxCapacity: 1.0, + jobRunQueuingEnabled: true, }); new integ.IntegTest(app, 'aws-glue-job-python-shell-integ-test', { testCases: [stack], }); -app.synth(); +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar new file mode 100644 index 0000000000000..41a6aa95d5aff Binary files /dev/null and b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar differ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.assets.json new file mode 100644 index 0000000000000..bccb1a07c98a9 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602": { + "source": { + "path": "asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "95d6306a689415ff849d8061f263d71b4ee7eab3bb724e06f1356c346a111258": { + "source": { + "path": "aws-glue-job-scalaspark-etl.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "95d6306a689415ff849d8061f263d71b4ee7eab3bb724e06f1356c346a111258.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.template.json new file mode 100644 index 0000000000000..5fb005ed30ff3 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/aws-glue-job-scalaspark-etl.template.json @@ -0,0 +1,206 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicScalaSparkETLJob5F894E39": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverrideScalaSparkETLJobC019089C": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override ScalaSpark ETL Job", + "GlueVersion": "3.0", + "Name": "Optional Override ScalaSpark ETL Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets.json new file mode 100644 index 0000000000000..7cb05d5c06149 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/integ.json new file mode 100644 index 0000000000000..486814268b7d5 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-scalaspark-etl-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-scalaspark-etl" + ], + "assertionStack": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/manifest.json new file mode 100644 index 0000000000000..ae9bae3832736 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-scalaspark-etl.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-scalaspark-etl.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-scalaspark-etl": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-scalaspark-etl.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/95d6306a689415ff849d8061f263d71b4ee7eab3bb724e06f1356c346a111258.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-scalaspark-etl.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-scalaspark-etl.assets" + ], + "metadata": { + "/aws-glue-job-scalaspark-etl/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-scalaspark-etl/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicScalaSparkETLJob5F894E39" + } + ], + "/aws-glue-job-scalaspark-etl/OverrideScalaSparkETLJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverrideScalaSparkETLJobC019089C" + } + ], + "/aws-glue-job-scalaspark-etl/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalaspark-etl/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalaspark-etl" + }, + "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobscalasparketlintegtestDefaultTestDeployAssertCA9A8121.assets" + ], + "metadata": { + "/aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/tree.json new file mode 100644 index 0000000000000..a790ddecb7f3e --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.js.snapshot/tree.json @@ -0,0 +1,375 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-scalaspark-etl": { + "id": "aws-glue-job-scalaspark-etl", + "path": "aws-glue-job-scalaspark-etl", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-scalaspark-etl/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-scalaspark-etl/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-etl/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-scalaspark-etl/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-etl/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicScalaSparkETLJob": { + "id": "BasicScalaSparkETLJob", + "path": "aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob", + "children": { + "Codeb58a68516710fd95a65c427a7e567405": { + "id": "Codeb58a68516710fd95a65c427a7e567405", + "path": "aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob/Codeb58a68516710fd95a65c427a7e567405", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob/Codeb58a68516710fd95a65c427a7e567405/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob/Codeb58a68516710fd95a65c427a7e567405/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-etl/BasicScalaSparkETLJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkEtlJob", + "version": "0.0.0" + } + }, + "OverrideScalaSparkETLJob": { + "id": "OverrideScalaSparkETLJob", + "path": "aws-glue-job-scalaspark-etl/OverrideScalaSparkETLJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-etl/OverrideScalaSparkETLJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override ScalaSpark ETL Job", + "glueVersion": "3.0", + "name": "Optional Override ScalaSpark ETL Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkEtlJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalaspark-etl/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalaspark-etl/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-scalaspark-etl-integ-test": { + "id": "aws-glue-job-scalaspark-etl-integ-test", + "path": "aws-glue-job-scalaspark-etl-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalaspark-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.ts new file mode 100644 index 0000000000000..ac43359fcc520 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-etl.ts @@ -0,0 +1,64 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-scalaspark-etl'); + +const jar_file = glue.Code.fromAsset(path.join(__dirname, 'job-jar', 'helloworld.jar')); +const job_class ='com.example.HelloWorld'; + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.ScalaSparkEtlJob(stack, 'BasicScalaSparkETLJob', { + script: jar_file, + role: iam_role, + className: job_class, +}); + +new glue.ScalaSparkEtlJob(stack, 'OverrideScalaSparkETLJob', { + script: jar_file, + className: job_class, + role: iam_role, + description: 'Optional Override ScalaSpark ETL Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override ScalaSpark ETL Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, + jobRunQueuingEnabled: true, +}); + +new integ.IntegTest(app, 'aws-glue-job-scalaspark-etl-integ-test', { + testCases: [stack], +}); + +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar new file mode 100644 index 0000000000000..41a6aa95d5aff Binary files /dev/null and b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar differ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json new file mode 100644 index 0000000000000..034678624b0c0 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602": { + "source": { + "path": "asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "8eb4431dd31801d6750894521b469099ec12fdf088e934030d0e8f4775aef416": { + "source": { + "path": "aws-glue-job-scalasparkflex-etl.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "8eb4431dd31801d6750894521b469099ec12fdf088e934030d0e8f4775aef416.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json new file mode 100644 index 0000000000000..44a994406b023 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json @@ -0,0 +1,208 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicScalaSparkFlexEtlJobF8FD9EFB": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverrideScalaSparkFlexEtlJob843D93B4": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override ScalaSpark Flex Etl Job", + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "Name": "Optional Override ScalaSpark Flex Etl Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json new file mode 100644 index 0000000000000..22bd76fefdc70 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/integ.json new file mode 100644 index 0000000000000..694662c13ef3a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-scalasparkflex-etl" + ], + "assertionStack": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/manifest.json new file mode 100644 index 0000000000000..8b991a073dd5a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-scalasparkflex-etl.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-scalasparkflex-etl.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-scalasparkflex-etl": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-scalasparkflex-etl.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/8eb4431dd31801d6750894521b469099ec12fdf088e934030d0e8f4775aef416.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-scalasparkflex-etl.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-scalasparkflex-etl.assets" + ], + "metadata": { + "/aws-glue-job-scalasparkflex-etl/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicScalaSparkFlexEtlJobF8FD9EFB" + } + ], + "/aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverrideScalaSparkFlexEtlJob843D93B4" + } + ], + "/aws-glue-job-scalasparkflex-etl/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalasparkflex-etl/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalasparkflex-etl" + }, + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets" + ], + "metadata": { + "/aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/tree.json new file mode 100644 index 0000000000000..ed0b43e367bcd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.js.snapshot/tree.json @@ -0,0 +1,377 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-scalasparkflex-etl": { + "id": "aws-glue-job-scalasparkflex-etl", + "path": "aws-glue-job-scalasparkflex-etl", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicScalaSparkFlexEtlJob": { + "id": "BasicScalaSparkFlexEtlJob", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob", + "children": { + "Codeb58a68516710fd95a65c427a7e567405": { + "id": "Codeb58a68516710fd95a65c427a7e567405", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "executionClass": "FLEX", + "glueVersion": "3.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkFlexEtlJob", + "version": "0.0.0" + } + }, + "OverrideScalaSparkFlexEtlJob": { + "id": "OverrideScalaSparkFlexEtlJob", + "path": "aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override ScalaSpark Flex Etl Job", + "executionClass": "FLEX", + "glueVersion": "3.0", + "name": "Optional Override ScalaSpark Flex Etl Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkFlexEtlJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-scalasparkflex-etl-integ-test": { + "id": "aws-glue-job-scalasparkflex-etl-integ-test", + "path": "aws-glue-job-scalasparkflex-etl-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.ts new file mode 100644 index 0000000000000..6ecb66ded6352 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-flex-etl.ts @@ -0,0 +1,63 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-scalasparkflex-etl'); + +const jar_file = glue.Code.fromAsset(path.join(__dirname, 'job-jar', 'helloworld.jar')); +const job_class ='com.example.HelloWorld'; + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.ScalaSparkFlexEtlJob(stack, 'BasicScalaSparkFlexEtlJob', { + script: jar_file, + role: iam_role, + className: job_class, +}); + +new glue.ScalaSparkFlexEtlJob(stack, 'OverrideScalaSparkFlexEtlJob', { + script: jar_file, + className: job_class, + role: iam_role, + description: 'Optional Override ScalaSpark Flex Etl Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override ScalaSpark Flex Etl Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, +}); + +new integ.IntegTest(app, 'aws-glue-job-scalasparkflex-etl-integ-test', { + testCases: [stack], +}); + +app.synth(); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar new file mode 100644 index 0000000000000..41a6aa95d5aff Binary files /dev/null and b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar differ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.assets.json new file mode 100644 index 0000000000000..70bd2cbe00c89 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602": { + "source": { + "path": "asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "34ed620f765a71adfb1015fa87746014460ecb440ed6bbba8cf4ddcec0f5104e": { + "source": { + "path": "aws-glue-job-scalaspark-streaming.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "34ed620f765a71adfb1015fa87746014460ecb440ed6bbba8cf4ddcec0f5104e.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.template.json new file mode 100644 index 0000000000000..71f0886daa41a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/aws-glue-job-scalaspark-streaming.template.json @@ -0,0 +1,206 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicScalaSparkStreamingJob03E183FE": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "gluestreaming", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverrideScalaSparkStreamingJob598931ED": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "gluestreaming", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override ScalaSpark Streaming Job", + "GlueVersion": "3.0", + "Name": "Optional Override ScalaSpark Streaming Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets.json new file mode 100644 index 0000000000000..867fa0e23043d --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/integ.json new file mode 100644 index 0000000000000..179bc5aa9c605 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-scalaspark-streaming" + ], + "assertionStack": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/manifest.json new file mode 100644 index 0000000000000..c59b801fdf45b --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-scalaspark-streaming.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-scalaspark-streaming.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-scalaspark-streaming": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-scalaspark-streaming.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/34ed620f765a71adfb1015fa87746014460ecb440ed6bbba8cf4ddcec0f5104e.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-scalaspark-streaming.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-scalaspark-streaming.assets" + ], + "metadata": { + "/aws-glue-job-scalaspark-streaming/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-scalaspark-streaming/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicScalaSparkStreamingJob03E183FE" + } + ], + "/aws-glue-job-scalaspark-streaming/OverrideScalaSparkStreamingJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverrideScalaSparkStreamingJob598931ED" + } + ], + "/aws-glue-job-scalaspark-streaming/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalaspark-streaming/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalaspark-streaming" + }, + "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobscalasparkstreamingintegtestDefaultTestDeployAssertCD3F6A81.assets" + ], + "metadata": { + "/aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/tree.json new file mode 100644 index 0000000000000..6e4736b728178 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.js.snapshot/tree.json @@ -0,0 +1,375 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-scalaspark-streaming": { + "id": "aws-glue-job-scalaspark-streaming", + "path": "aws-glue-job-scalaspark-streaming", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-scalaspark-streaming/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-scalaspark-streaming/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-streaming/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-scalaspark-streaming/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-streaming/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicScalaSparkStreamingJob": { + "id": "BasicScalaSparkStreamingJob", + "path": "aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob", + "children": { + "Codeb58a68516710fd95a65c427a7e567405": { + "id": "Codeb58a68516710fd95a65c427a7e567405", + "path": "aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob/Codeb58a68516710fd95a65c427a7e567405", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob/Codeb58a68516710fd95a65c427a7e567405/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob/Codeb58a68516710fd95a65c427a7e567405/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-streaming/BasicScalaSparkStreamingJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "gluestreaming", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkStreamingJob", + "version": "0.0.0" + } + }, + "OverrideScalaSparkStreamingJob": { + "id": "OverrideScalaSparkStreamingJob", + "path": "aws-glue-job-scalaspark-streaming/OverrideScalaSparkStreamingJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalaspark-streaming/OverrideScalaSparkStreamingJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "gluestreaming", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override ScalaSpark Streaming Job", + "glueVersion": "3.0", + "name": "Optional Override ScalaSpark Streaming Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkStreamingJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalaspark-streaming/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalaspark-streaming/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-scalaspark-streaming-integ-test": { + "id": "aws-glue-job-scalaspark-streaming-integ-test", + "path": "aws-glue-job-scalaspark-streaming-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalaspark-streaming-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.ts new file mode 100644 index 0000000000000..9dfb5b450dc61 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalaspark-streaming.ts @@ -0,0 +1,64 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ETLJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ETLJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ETLJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'aws-glue-job-scalaspark-streaming'); + +const jar_file = glue.Code.fromAsset(path.join(__dirname, 'job-jar', 'helloworld.jar')); +const job_class ='com.example.HelloWorld'; + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.ScalaSparkStreamingJob(stack, 'BasicScalaSparkStreamingJob', { + script: jar_file, + role: iam_role, + className: job_class, +}); + +new glue.ScalaSparkStreamingJob(stack, 'OverrideScalaSparkStreamingJob', { + script: jar_file, + className: job_class, + role: iam_role, + description: 'Optional Override ScalaSpark Streaming Job', + glueVersion: glue.GlueVersion.V3_0, + numberOfWorkers: 20, + workerType: glue.WorkerType.G_1X, + timeout: cdk.Duration.minutes(15), + jobName: 'Optional Override ScalaSpark Streaming Job', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, + jobRunQueuingEnabled: true, +}); + +new integ.IntegTest(app, 'aws-glue-job-scalaspark-streaming-integ-test', { + testCases: [stack], +}); + +app.synth(); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar new file mode 100644 index 0000000000000..41a6aa95d5aff Binary files /dev/null and b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar differ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json new file mode 100644 index 0000000000000..246e028be6d6f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602": { + "source": { + "path": "asset.e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "ff1b08d04e7d65e42ead8e33a88a380c6678218b733d0b350cd0bea32ec2944f": { + "source": { + "path": "aws-glue-job-scalasparkflex-etl.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "ff1b08d04e7d65e42ead8e33a88a380c6678218b733d0b350cd0bea32ec2944f.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json new file mode 100644 index 0000000000000..f046068d73e7c --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/aws-glue-job-scalasparkflex-etl.template.json @@ -0,0 +1,206 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicScalaSparkFlexEtlJobF8FD9EFB": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "NumberOfWorkers": 10, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "G.1X" + } + }, + "OverrideScalaSparkFlexEtlJob843D93B4": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "Description": "Optional Override ScalaSpark Flex Etl Job", + "ExecutionClass": "FLEX", + "GlueVersion": "3.0", + "Name": "Optional Override ScalaSpark Flex Etl Job", + "NumberOfWorkers": 20, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "Timeout": 15, + "WorkerType": "G.1X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json new file mode 100644 index 0000000000000..22bd76fefdc70 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/integ.json new file mode 100644 index 0000000000000..694662c13ef3a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-job-scalasparkflex-etl" + ], + "assertionStack": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/manifest.json new file mode 100644 index 0000000000000..76778efc60610 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-job-scalasparkflex-etl.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-job-scalasparkflex-etl.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-job-scalasparkflex-etl": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-job-scalasparkflex-etl.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/ff1b08d04e7d65e42ead8e33a88a380c6678218b733d0b350cd0bea32ec2944f.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-job-scalasparkflex-etl.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-job-scalasparkflex-etl.assets" + ], + "metadata": { + "/aws-glue-job-scalasparkflex-etl/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicScalaSparkFlexEtlJobF8FD9EFB" + } + ], + "/aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OverrideScalaSparkFlexEtlJob843D93B4" + } + ], + "/aws-glue-job-scalasparkflex-etl/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalasparkflex-etl/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalasparkflex-etl" + }, + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluejobscalasparkflexetlintegtestDefaultTestDeployAssert8009E6FC.assets" + ], + "metadata": { + "/aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/tree.json new file mode 100644 index 0000000000000..c8dc5ada88490 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.job-scalasparkflex-etl.js.snapshot/tree.json @@ -0,0 +1,375 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-job-scalasparkflex-etl": { + "id": "aws-glue-job-scalasparkflex-etl", + "path": "aws-glue-job-scalasparkflex-etl", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicScalaSparkFlexEtlJob": { + "id": "BasicScalaSparkFlexEtlJob", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob", + "children": { + "Codeb58a68516710fd95a65c427a7e567405": { + "id": "Codeb58a68516710fd95a65c427a7e567405", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Codeb58a68516710fd95a65c427a7e567405/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/BasicScalaSparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "executionClass": "FLEX", + "glueVersion": "3.0", + "numberOfWorkers": 10, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkFlexEtlJob", + "version": "0.0.0" + } + }, + "OverrideScalaSparkFlexEtlJob": { + "id": "OverrideScalaSparkFlexEtlJob", + "path": "aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-job-scalasparkflex-etl/OverrideScalaSparkFlexEtlJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/e305655b966b957f91fcec580e3f8703573eb6b69528c5d52190d72579c91602.jar" + ] + ] + } + }, + "defaultArguments": { + "--job-language": "scala", + "--class": "com.example.HelloWorld", + "--enable-metrics": "", + "--enable-observability-metrics": "true", + "arg1": "value1", + "arg2": "value2" + }, + "description": "Optional Override ScalaSpark Flex Etl Job", + "executionClass": "FLEX", + "glueVersion": "3.0", + "name": "Optional Override ScalaSpark Flex Etl Job", + "numberOfWorkers": 20, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "timeout": 15, + "workerType": "G.1X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.ScalaSparkFlexEtlJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-job-scalasparkflex-etl-integ-test": { + "id": "aws-glue-job-scalasparkflex-etl-integ-test", + "path": "aws-glue-job-scalasparkflex-etl-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-job-scalasparkflex-etl-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.job.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.job.ts deleted file mode 100644 index 91bf9bab212fc..0000000000000 --- a/packages/@aws-cdk/aws-glue-alpha/test/integ.job.ts +++ /dev/null @@ -1,147 +0,0 @@ -import * as path from 'path'; -import * as cdk from 'aws-cdk-lib'; -import * as glue from '../lib'; - -/** - * To verify the ability to run jobs created in this test - * - * Run the job using - * `aws glue start-job-run --region us-east-1 --job-name ` - * This will return a runId - * - * Get the status of the job run using - * `aws glue get-job-run --region us-east-1 --job-name --run-id ` - * - * For example, to test the ShellJob - * - Run: `aws glue start-job-run --region us-east-1 --job-name ShellJob` - * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ShellJob --run-id ` - * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" - */ -const app = new cdk.App(); - -const stack = new cdk.Stack(app, 'aws-glue-job'); - -const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); -const scriptResolveOptions = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'resolve_options.py')); -const moduleUtils = glue.Code.fromAsset(path.join(__dirname, 'module', 'utils.zip')); - -[glue.GlueVersion.V2_0, glue.GlueVersion.V3_0, glue.GlueVersion.V4_0].forEach((glueVersion) => { - const etlJob = new glue.Job(stack, 'EtlJob' + glueVersion.name, { - jobName: 'EtlJob' + glueVersion.name, - executable: glue.JobExecutable.pythonEtl({ - pythonVersion: glue.PythonVersion.THREE, - glueVersion, - script, - }), - workerType: glue.WorkerType.G_1X, - workerCount: 10, - maxConcurrentRuns: 2, - maxRetries: 2, - timeout: cdk.Duration.minutes(5), - notifyDelayAfter: cdk.Duration.minutes(1), - defaultArguments: { - 'arg1': 'value1', - 'arg2': 'value2', - '--conf': 'valueConf', - }, - sparkUI: { - enabled: true, - }, - continuousLogging: { - enabled: true, - quiet: true, - logStreamPrefix: 'EtlJob', - }, - executionClass: glue.ExecutionClass.STANDARD, - tags: { - key: 'value', - }, - }); - etlJob.metricSuccess(); - new glue.Job(stack, 'StreamingJob' + glueVersion.name, { - jobName: 'StreamingJob' + glueVersion.name, - executable: glue.JobExecutable.pythonStreaming({ - pythonVersion: glue.PythonVersion.THREE, - glueVersion, - script, - }), - workerType: [glue.GlueVersion.V2_0].includes(glueVersion) ? glue.WorkerType.G_1X : glue.WorkerType.G_025X, - workerCount: 10, - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - sparkUI: { - enabled: true, - }, - tags: { - key: 'value', - }, - }); -}); - -new glue.Job(stack, 'ShellJob', { - jobName: 'ShellJob', - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - tags: { - key: 'value', - }, -}); - -new glue.Job(stack, 'ShellJob39', { - jobName: 'ShellJob39', - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - }), - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - tags: { - key: 'value', - }, -}); - -new glue.Job(stack, 'RayJob', { - jobName: 'RayJob', - executable: glue.JobExecutable.pythonRay({ - glueVersion: glue.GlueVersion.V4_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - s3PythonModules: [moduleUtils], - script: scriptResolveOptions, - }), - workerType: glue.WorkerType.Z_2X, - workerCount: 2, - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - tags: { - key: 'value', - }, -}); - -new glue.Job(stack, 'EtlJobWithFLEX', { - jobName: 'EtlJobWithFLEX', - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - workerType: glue.WorkerType.G_1X, - workerCount: 10, - executionClass: glue.ExecutionClass.FLEX, -}); - -app.synth(); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py new file mode 100644 index 0000000000000..e75154b7c390f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py @@ -0,0 +1 @@ +print("hello world") \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.assets.json new file mode 100644 index 0000000000000..3b876e16c7915 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { + "source": { + "path": "asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "88c38c39c4e4154ff32d6a619436c3605447e88e9f7b2917c0a4bdbec101913e": { + "source": { + "path": "aws-glue-ray-job.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "88c38c39c4e4154ff32d6a619436c3605447e88e9f7b2917c0a4bdbec101913e.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.template.json new file mode 100644 index 0000000000000..1449533215f50 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/aws-glue-ray-job.template.json @@ -0,0 +1,202 @@ +{ + "Resources": { + "IAMServiceRole61C662C4": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "ManagedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "IAMServiceRoleDefaultPolicy379D1A0E": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "Roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "BasicRayJobF8D69550": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueray", + "Runtime": "Ray2.4", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 3, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "WorkerType": "Z.2X" + } + }, + "RayJob5Workers11381A2E": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueray", + "Runtime": "Ray2.4", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "arg1": "value1", + "arg2": "value2", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "Name": "RayJobWith5Workers", + "NumberOfWorkers": 5, + "Role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "Tags": { + "key": "value" + }, + "WorkerType": "Z.2X" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets.json new file mode 100644 index 0000000000000..277f637073ffd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/integ.json new file mode 100644 index 0000000000000..38d8633d3f555 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-glue-ray-job-integ-test/DefaultTest": { + "stacks": [ + "aws-glue-ray-job" + ], + "assertionStack": "aws-glue-ray-job-integ-test/DefaultTest/DeployAssert", + "assertionStackName": "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/manifest.json new file mode 100644 index 0000000000000..87e43681bf422 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/manifest.json @@ -0,0 +1,131 @@ +{ + "version": "36.0.0", + "artifacts": { + "aws-glue-ray-job.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "aws-glue-ray-job.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "aws-glue-ray-job": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "aws-glue-ray-job.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/88c38c39c4e4154ff32d6a619436c3605447e88e9f7b2917c0a4bdbec101913e.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "aws-glue-ray-job.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "aws-glue-ray-job.assets" + ], + "metadata": { + "/aws-glue-ray-job/IAMServiceRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRole61C662C4" + } + ], + "/aws-glue-ray-job/IAMServiceRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "IAMServiceRoleDefaultPolicy379D1A0E" + } + ], + "/aws-glue-ray-job/BasicRayJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "BasicRayJobF8D69550" + } + ], + "/aws-glue-ray-job/RayJob5Workers/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "RayJob5Workers11381A2E" + } + ], + "/aws-glue-ray-job/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-ray-job/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-ray-job" + }, + "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awsgluerayjobintegtestDefaultTestDeployAssert7A3FC747.assets" + ], + "metadata": { + "/aws-glue-ray-job-integ-test/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-glue-ray-job-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-glue-ray-job-integ-test/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/tree.json new file mode 100644 index 0000000000000..9d05cf81c41f3 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.js.snapshot/tree.json @@ -0,0 +1,371 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "aws-glue-ray-job": { + "id": "aws-glue-ray-job", + "path": "aws-glue-ray-job", + "children": { + "IAMServiceRole": { + "id": "IAMServiceRole", + "path": "aws-glue-ray-job/IAMServiceRole", + "children": { + "ImportIAMServiceRole": { + "id": "ImportIAMServiceRole", + "path": "aws-glue-ray-job/IAMServiceRole/ImportIAMServiceRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-ray-job/IAMServiceRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + }, + "managedPolicyArns": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":iam::aws:policy/service-role/AWSGlueServiceRole" + ] + ] + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "aws-glue-ray-job/IAMServiceRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-ray-job/IAMServiceRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "IAMServiceRoleDefaultPolicy379D1A0E", + "roles": [ + { + "Ref": "IAMServiceRole61C662C4" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "BasicRayJob": { + "id": "BasicRayJob", + "path": "aws-glue-ray-job/BasicRayJob", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "aws-glue-ray-job/BasicRayJob/Code2907ea7be4a583708cfffc21b3df1dfa", + "children": { + "Stage": { + "id": "Stage", + "path": "aws-glue-ray-job/BasicRayJob/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "aws-glue-ray-job/BasicRayJob/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "aws-glue-ray-job/BasicRayJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueray", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "runtime": "Ray2.4" + }, + "defaultArguments": { + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 3, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "workerType": "Z.2X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.RayJob", + "version": "0.0.0" + } + }, + "RayJob5Workers": { + "id": "RayJob5Workers", + "path": "aws-glue-ray-job/RayJob5Workers", + "children": { + "Resource": { + "id": "Resource", + "path": "aws-glue-ray-job/RayJob5Workers/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueray", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "runtime": "Ray2.4" + }, + "defaultArguments": { + "arg1": "value1", + "arg2": "value2", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "name": "RayJobWith5Workers", + "numberOfWorkers": 5, + "role": { + "Fn::GetAtt": [ + "IAMServiceRole61C662C4", + "Arn" + ] + }, + "tags": { + "key": "value" + }, + "workerType": "Z.2X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.RayJob", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-ray-job/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-ray-job/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-glue-ray-job-integ-test": { + "id": "aws-glue-ray-job-integ-test", + "path": "aws-glue-ray-job-integ-test", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-glue-ray-job-integ-test/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-glue-ray-job-integ-test/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-glue-ray-job-integ-test/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-glue-ray-job-integ-test/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-glue-ray-job-integ-test/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.ts new file mode 100644 index 0000000000000..35aa41f74c125 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.ray-job.ts @@ -0,0 +1,56 @@ +import * as path from 'path'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as integ from '@aws-cdk/integ-tests-alpha'; + +/** + * To verify the ability to run jobs created in this test + * + * Run the job using + * `aws glue start-job-run --region us-east-1 --job-name ` + * This will return a runId + * + * Get the status of the job run using + * `aws glue get-job-run --region us-east-1 --job-name --run-id ` + * + * For example, to test the ShellJob + * - Run: `aws glue start-job-run --region us-east-1 --job-name ShellJob` + * - Get Status: `aws glue get-job-run --region us-east-1 --job-name ShellJob --run-id ` + * - Check output: `aws logs get-log-events --region us-east-1 --log-group-name "/aws-glue/python-jobs/output" --log-stream-name ">` which should show "hello world" + */ +const app = new cdk.App(); + +const stack = new cdk.Stack(app, 'aws-glue-ray-job'); + +const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); + +const iam_role = new iam.Role(stack, 'IAMServiceRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], +}); + +new glue.RayJob(stack, 'BasicRayJob', { + script: script, + role: iam_role, +}); + +new glue.RayJob(stack, 'RayJob5Workers', { + script: script, + role: iam_role, + numberOfWorkers: 5, + jobName: 'RayJobWith5Workers', + defaultArguments: { + arg1: 'value1', + arg2: 'value2', + }, + tags: { + key: 'value', + }, +}); + +new integ.IntegTest(app, 'aws-glue-ray-job-integ-test', { + testCases: [stack], +}); + +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.assets.json new file mode 100644 index 0000000000000..020d92b9ce3ed --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.assets.json @@ -0,0 +1,32 @@ +{ + "version": "36.0.0", + "files": { + "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855": { + "source": { + "path": "asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + }, + "db15b89b0de33d7503c531cae5fa3f18506eb8982953470211e04f53dfe9a2da": { + "source": { + "path": "GlueWorkflowTriggerStack.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "db15b89b0de33d7503c531cae5fa3f18506eb8982953470211e04f53dfe9a2da.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.template.json new file mode 100644 index 0000000000000..986a1cca24cd7 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/GlueWorkflowTriggerStack.template.json @@ -0,0 +1,244 @@ +{ + "Resources": { + "Workflow193EF7C1": { + "Type": "AWS::Glue::Workflow", + "Properties": { + "Description": "MyWorkflow" + } + }, + "WorkflowOnDemandTriggerEE8E75A1": { + "Type": "AWS::Glue::Trigger", + "Properties": { + "Actions": [ + { + "JobName": { + "Ref": "InboundJobEDA3CBF4" + } + } + ], + "Type": "ON_DEMAND", + "WorkflowName": { + "Ref": "Workflow193EF7C1" + } + } + }, + "WorkflowConditionalTrigger133C0CA8": { + "Type": "AWS::Glue::Trigger", + "Properties": { + "Actions": [ + { + "JobName": { + "Ref": "OutboundJobB5826414" + } + } + ], + "EventBatchingCondition": { + "BatchSize": 1, + "BatchWindow": 900 + }, + "Predicate": { + "Conditions": [ + { + "JobName": { + "Ref": "InboundJobEDA3CBF4" + }, + "LogicalOperator": "EQUALS", + "State": "SUCCEEDED" + } + ] + }, + "Type": "CONDITIONAL", + "WorkflowName": { + "Ref": "Workflow193EF7C1" + } + } + }, + "JobRole014917C6": { + "Type": "AWS::IAM::Role", + "Properties": { + "AssumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + } + } + }, + "JobRoleDefaultPolicy5DE0D8F9": { + "Type": "AWS::IAM::Policy", + "Properties": { + "PolicyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "PolicyName": "JobRoleDefaultPolicy5DE0D8F9", + "Roles": [ + { + "Ref": "JobRole014917C6" + } + ] + } + }, + "OutboundJobB5826414": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3.9", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 2, + "Role": { + "Fn::GetAtt": [ + "JobRole014917C6", + "Arn" + ] + }, + "WorkerType": "G.2X" + } + }, + "InboundJobEDA3CBF4": { + "Type": "AWS::Glue::Job", + "Properties": { + "Command": { + "Name": "glueetl", + "PythonVersion": "3.9", + "ScriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + } + }, + "DefaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "GlueVersion": "4.0", + "NumberOfWorkers": 2, + "Role": { + "Fn::GetAtt": [ + "JobRole014917C6", + "Arn" + ] + }, + "WorkerType": "G.2X" + } + } + }, + "Outputs": { + "WorkflowName": { + "Value": { + "Ref": "Workflow193EF7C1" + } + } + }, + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py new file mode 100644 index 0000000000000..e75154b7c390f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/asset.432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py @@ -0,0 +1 @@ +print("hello world") \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets.json new file mode 100644 index 0000000000000..d2484f5013f09 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets.json @@ -0,0 +1,19 @@ +{ + "version": "36.0.0", + "files": { + "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": { + "source": { + "path": "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.template.json", + "packaging": "file" + }, + "destinations": { + "current_account-current_region": { + "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}", + "objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}" + } + } + } + }, + "dockerImages": {} +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.template.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.template.json new file mode 100644 index 0000000000000..ad9d0fb73d1dd --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.template.json @@ -0,0 +1,36 @@ +{ + "Parameters": { + "BootstrapVersion": { + "Type": "AWS::SSM::Parameter::Value", + "Default": "/cdk-bootstrap/hnb659fds/version", + "Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]" + } + }, + "Rules": { + "CheckBootstrapVersion": { + "Assertions": [ + { + "Assert": { + "Fn::Not": [ + { + "Fn::Contains": [ + [ + "1", + "2", + "3", + "4", + "5" + ], + { + "Ref": "BootstrapVersion" + } + ] + } + ] + }, + "AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI." + } + ] + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/cdk.out b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/cdk.out new file mode 100644 index 0000000000000..1f0068d32659a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/cdk.out @@ -0,0 +1 @@ +{"version":"36.0.0"} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/integ.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/integ.json new file mode 100644 index 0000000000000..dc0019765cecf --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/integ.json @@ -0,0 +1,12 @@ +{ + "version": "36.0.0", + "testCases": { + "aws-cdk-glue-workflow-trigger-integ/DefaultTest": { + "stacks": [ + "GlueWorkflowTriggerStack" + ], + "assertionStack": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert", + "assertionStackName": "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/manifest.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/manifest.json new file mode 100644 index 0000000000000..9b01d7d71edae --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/manifest.json @@ -0,0 +1,155 @@ +{ + "version": "36.0.0", + "artifacts": { + "GlueWorkflowTriggerStack.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "GlueWorkflowTriggerStack.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "GlueWorkflowTriggerStack": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "GlueWorkflowTriggerStack.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/db15b89b0de33d7503c531cae5fa3f18506eb8982953470211e04f53dfe9a2da.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "GlueWorkflowTriggerStack.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "GlueWorkflowTriggerStack.assets" + ], + "metadata": { + "/GlueWorkflowTriggerStack/Workflow/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "Workflow193EF7C1" + } + ], + "/GlueWorkflowTriggerStack/Workflow/OnDemandTrigger": [ + { + "type": "aws:cdk:logicalId", + "data": "WorkflowOnDemandTriggerEE8E75A1" + } + ], + "/GlueWorkflowTriggerStack/Workflow/ConditionalTrigger": [ + { + "type": "aws:cdk:logicalId", + "data": "WorkflowConditionalTrigger133C0CA8" + } + ], + "/GlueWorkflowTriggerStack/JobRole/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "JobRole014917C6" + } + ], + "/GlueWorkflowTriggerStack/JobRole/DefaultPolicy/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "JobRoleDefaultPolicy5DE0D8F9" + } + ], + "/GlueWorkflowTriggerStack/OutboundJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "OutboundJobB5826414" + } + ], + "/GlueWorkflowTriggerStack/InboundJob/Resource": [ + { + "type": "aws:cdk:logicalId", + "data": "InboundJobEDA3CBF4" + } + ], + "/GlueWorkflowTriggerStack/WorkflowName": [ + { + "type": "aws:cdk:logicalId", + "data": "WorkflowName" + } + ], + "/GlueWorkflowTriggerStack/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/GlueWorkflowTriggerStack/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "GlueWorkflowTriggerStack" + }, + "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets": { + "type": "cdk:asset-manifest", + "properties": { + "file": "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173": { + "type": "aws:cloudformation:stack", + "environment": "aws://unknown-account/unknown-region", + "properties": { + "templateFile": "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.template.json", + "terminationProtection": false, + "validateOnSynth": false, + "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}", + "cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}", + "stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json", + "requiresBootstrapStackVersion": 6, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version", + "additionalDependencies": [ + "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets" + ], + "lookupRole": { + "arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}", + "requiresBootstrapStackVersion": 8, + "bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version" + } + }, + "dependencies": [ + "awscdkglueworkflowtriggerintegDefaultTestDeployAssert43E79173.assets" + ], + "metadata": { + "/aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert/BootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "BootstrapVersion" + } + ], + "/aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert/CheckBootstrapVersion": [ + { + "type": "aws:cdk:logicalId", + "data": "CheckBootstrapVersion" + } + ] + }, + "displayName": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert" + }, + "Tree": { + "type": "cdk:tree", + "properties": { + "file": "tree.json" + } + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/tree.json b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/tree.json new file mode 100644 index 0000000000000..988a73eab37e0 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.js.snapshot/tree.json @@ -0,0 +1,448 @@ +{ + "version": "tree-0.1", + "tree": { + "id": "App", + "path": "", + "children": { + "GlueWorkflowTriggerStack": { + "id": "GlueWorkflowTriggerStack", + "path": "GlueWorkflowTriggerStack", + "children": { + "Workflow": { + "id": "Workflow", + "path": "GlueWorkflowTriggerStack/Workflow", + "children": { + "Resource": { + "id": "Resource", + "path": "GlueWorkflowTriggerStack/Workflow/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Workflow", + "aws:cdk:cloudformation:props": { + "description": "MyWorkflow" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnWorkflow", + "version": "0.0.0" + } + }, + "OnDemandTrigger": { + "id": "OnDemandTrigger", + "path": "GlueWorkflowTriggerStack/Workflow/OnDemandTrigger", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Trigger", + "aws:cdk:cloudformation:props": { + "actions": [ + { + "jobName": { + "Ref": "InboundJobEDA3CBF4" + } + } + ], + "type": "ON_DEMAND", + "workflowName": { + "Ref": "Workflow193EF7C1" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnTrigger", + "version": "0.0.0" + } + }, + "ConditionalTrigger": { + "id": "ConditionalTrigger", + "path": "GlueWorkflowTriggerStack/Workflow/ConditionalTrigger", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Trigger", + "aws:cdk:cloudformation:props": { + "actions": [ + { + "jobName": { + "Ref": "OutboundJobB5826414" + } + } + ], + "eventBatchingCondition": { + "batchSize": 1, + "batchWindow": 900 + }, + "predicate": { + "conditions": [ + { + "logicalOperator": "EQUALS", + "jobName": { + "Ref": "InboundJobEDA3CBF4" + }, + "state": "SUCCEEDED" + } + ] + }, + "type": "CONDITIONAL", + "workflowName": { + "Ref": "Workflow193EF7C1" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnTrigger", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.Workflow", + "version": "0.0.0" + } + }, + "JobRole": { + "id": "JobRole", + "path": "GlueWorkflowTriggerStack/JobRole", + "children": { + "ImportJobRole": { + "id": "ImportJobRole", + "path": "GlueWorkflowTriggerStack/JobRole/ImportJobRole", + "constructInfo": { + "fqn": "aws-cdk-lib.Resource", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "GlueWorkflowTriggerStack/JobRole/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Role", + "aws:cdk:cloudformation:props": { + "assumeRolePolicyDocument": { + "Statement": [ + { + "Action": "sts:AssumeRole", + "Effect": "Allow", + "Principal": { + "Service": "glue.amazonaws.com" + } + } + ], + "Version": "2012-10-17" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnRole", + "version": "0.0.0" + } + }, + "DefaultPolicy": { + "id": "DefaultPolicy", + "path": "GlueWorkflowTriggerStack/JobRole/DefaultPolicy", + "children": { + "Resource": { + "id": "Resource", + "path": "GlueWorkflowTriggerStack/JobRole/DefaultPolicy/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::IAM::Policy", + "aws:cdk:cloudformation:props": { + "policyDocument": { + "Statement": [ + { + "Action": [ + "s3:GetBucket*", + "s3:GetObject*", + "s3:List*" + ], + "Effect": "Allow", + "Resource": [ + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/*" + ] + ] + }, + { + "Fn::Join": [ + "", + [ + "arn:", + { + "Ref": "AWS::Partition" + }, + ":s3:::", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + } + ] + ] + } + ] + } + ], + "Version": "2012-10-17" + }, + "policyName": "JobRoleDefaultPolicy5DE0D8F9", + "roles": [ + { + "Ref": "JobRole014917C6" + } + ] + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.CfnPolicy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Policy", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_iam.Role", + "version": "0.0.0" + } + }, + "OutboundJob": { + "id": "OutboundJob", + "path": "GlueWorkflowTriggerStack/OutboundJob", + "children": { + "Code2907ea7be4a583708cfffc21b3df1dfa": { + "id": "Code2907ea7be4a583708cfffc21b3df1dfa", + "path": "GlueWorkflowTriggerStack/OutboundJob/Code2907ea7be4a583708cfffc21b3df1dfa", + "children": { + "Stage": { + "id": "Stage", + "path": "GlueWorkflowTriggerStack/OutboundJob/Code2907ea7be4a583708cfffc21b3df1dfa/Stage", + "constructInfo": { + "fqn": "aws-cdk-lib.AssetStaging", + "version": "0.0.0" + } + }, + "AssetBucket": { + "id": "AssetBucket", + "path": "GlueWorkflowTriggerStack/OutboundJob/Code2907ea7be4a583708cfffc21b3df1dfa/AssetBucket", + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3.BucketBase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_s3_assets.Asset", + "version": "0.0.0" + } + }, + "Resource": { + "id": "Resource", + "path": "GlueWorkflowTriggerStack/OutboundJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 2, + "role": { + "Fn::GetAtt": [ + "JobRole014917C6", + "Arn" + ] + }, + "workerType": "G.2X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkEtlJob", + "version": "0.0.0" + } + }, + "InboundJob": { + "id": "InboundJob", + "path": "GlueWorkflowTriggerStack/InboundJob", + "children": { + "Resource": { + "id": "Resource", + "path": "GlueWorkflowTriggerStack/InboundJob/Resource", + "attributes": { + "aws:cdk:cloudformation:type": "AWS::Glue::Job", + "aws:cdk:cloudformation:props": { + "command": { + "name": "glueetl", + "scriptLocation": { + "Fn::Join": [ + "", + [ + "s3://", + { + "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}" + }, + "/432033e3218068a915d2532fa9be7858a12b228a2ae6e5c10faccd9097b1e855.py" + ] + ] + }, + "pythonVersion": "3" + }, + "defaultArguments": { + "--job-language": "python", + "--enable-continuous-cloudwatch-log": "true", + "--enable-metrics": "", + "--enable-observability-metrics": "true" + }, + "glueVersion": "4.0", + "numberOfWorkers": 2, + "role": { + "Fn::GetAtt": [ + "JobRole014917C6", + "Arn" + ] + }, + "workerType": "G.2X" + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.aws_glue.CfnJob", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/aws-glue-alpha.PySparkEtlJob", + "version": "0.0.0" + } + }, + "WorkflowName": { + "id": "WorkflowName", + "path": "GlueWorkflowTriggerStack/WorkflowName", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnOutput", + "version": "0.0.0" + } + }, + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "GlueWorkflowTriggerStack/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "GlueWorkflowTriggerStack/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + }, + "aws-cdk-glue-workflow-trigger-integ": { + "id": "aws-cdk-glue-workflow-trigger-integ", + "path": "aws-cdk-glue-workflow-trigger-integ", + "children": { + "DefaultTest": { + "id": "DefaultTest", + "path": "aws-cdk-glue-workflow-trigger-integ/DefaultTest", + "children": { + "Default": { + "id": "Default", + "path": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/Default", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + }, + "DeployAssert": { + "id": "DeployAssert", + "path": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert", + "children": { + "BootstrapVersion": { + "id": "BootstrapVersion", + "path": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert/BootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnParameter", + "version": "0.0.0" + } + }, + "CheckBootstrapVersion": { + "id": "CheckBootstrapVersion", + "path": "aws-cdk-glue-workflow-trigger-integ/DefaultTest/DeployAssert/CheckBootstrapVersion", + "constructInfo": { + "fqn": "aws-cdk-lib.CfnRule", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.Stack", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTestCase", + "version": "0.0.0" + } + } + }, + "constructInfo": { + "fqn": "@aws-cdk/integ-tests-alpha.IntegTest", + "version": "0.0.0" + } + }, + "Tree": { + "id": "Tree", + "path": "Tree", + "constructInfo": { + "fqn": "constructs.Construct", + "version": "10.3.0" + } + } + }, + "constructInfo": { + "fqn": "aws-cdk-lib.App", + "version": "0.0.0" + } + } +} \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.ts b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.ts new file mode 100644 index 0000000000000..1f7e6cccd6302 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/integ.workflow.ts @@ -0,0 +1,60 @@ +import * as integ from '@aws-cdk/integ-tests-alpha'; +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as path from 'path'; + +const app = new cdk.App(); +const stack = new cdk.Stack(app, 'GlueWorkflowTriggerStack'); + +const workflow = new glue.Workflow(stack, 'Workflow', { + description: 'MyWorkflow', +}); + +const role = new iam.Role(stack, 'JobRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), +}); + +const script = glue.Code.fromAsset(path.join(__dirname, 'job-script', 'hello_world.py')); + +const OutboundJob = new glue.PySparkEtlJob(stack, 'OutboundJob', { + script: script, + role, + glueVersion: glue.GlueVersion.V4_0, + workerType: glue.WorkerType.G_2X, + numberOfWorkers: 2, +}); + +const InboundJob = new glue.PySparkEtlJob(stack, 'InboundJob', { + script: script, + role, + glueVersion: glue.GlueVersion.V4_0, + workerType: glue.WorkerType.G_2X, + numberOfWorkers: 2, +}); + +workflow.addOnDemandTrigger('OnDemandTrigger', { + actions: [{ job: InboundJob }], +}); + +workflow.addconditionalTrigger('ConditionalTrigger', { + actions: [{ job: OutboundJob }], + predicate: { + conditions: [ + { + job: InboundJob, + state: glue.JobState.SUCCEEDED, + }, + ], + }, +}); + +new cdk.CfnOutput(stack, 'WorkflowName', { + value: workflow.workflowName, +}); + +new integ.IntegTest(app, 'aws-cdk-glue-workflow-trigger-integ', { + testCases: [stack], +}); + +app.synth(); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/job-executable.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/job-executable.test.ts deleted file mode 100644 index d00faa55091ba..0000000000000 --- a/packages/@aws-cdk/aws-glue-alpha/test/job-executable.test.ts +++ /dev/null @@ -1,282 +0,0 @@ -import * as s3 from 'aws-cdk-lib/aws-s3'; -import * as cdk from 'aws-cdk-lib'; -import * as glue from '../lib'; - -describe('GlueVersion', () => { - test('.V0_9 should set the name correctly', () => expect(glue.GlueVersion.V0_9.name).toEqual('0.9')); - - test('.V1_0 should set the name correctly', () => expect(glue.GlueVersion.V1_0.name).toEqual('1.0')); - - test('.V2_0 should set the name correctly', () => expect(glue.GlueVersion.V2_0.name).toEqual('2.0')); - - test('.V3_0 should set the name correctly', () => expect(glue.GlueVersion.V3_0.name).toEqual('3.0')); - - test('.V4_0 should set the name correctly', () => expect(glue.GlueVersion.V4_0.name).toEqual('4.0')); - - test('of(customVersion) should set the name correctly', () => expect(glue.GlueVersion.of('CustomVersion').name).toEqual('CustomVersion')); -}); - -describe('PythonVersion', () => { - test('.TWO should set the name correctly', () => expect(glue.PythonVersion.TWO).toEqual('2')); - - test('.THREE should set the name correctly', () => expect(glue.PythonVersion.THREE).toEqual('3')); - - test('.THREE_NINE should set the name correctly', () => expect(glue.PythonVersion.THREE_NINE).toEqual('3.9')); -}); - -describe('JobType', () => { - test('.ETL should set the name correctly', () => expect(glue.JobType.ETL.name).toEqual('glueetl')); - - test('.STREAMING should set the name correctly', () => expect(glue.JobType.STREAMING.name).toEqual('gluestreaming')); - - test('.PYTHON_SHELL should set the name correctly', () => expect(glue.JobType.PYTHON_SHELL.name).toEqual('pythonshell')); - - test('.RAY should set the name correctly', () => expect(glue.JobType.RAY.name).toEqual('glueray')); - - test('of(customName) should set the name correctly', () => expect(glue.JobType.of('CustomName').name).toEqual('CustomName')); -}); - -describe('JobExecutable', () => { - let stack: cdk.Stack; - let bucket: s3.IBucket; - let script: glue.Code; - - beforeEach(() => { - stack = new cdk.Stack(); - bucket = s3.Bucket.fromBucketName(stack, 'Bucket', 'bucketname'); - script = glue.Code.fromBucket(bucket, 'script.py'); - }); - - describe('.of()', () => { - test('with valid config should succeed', () => { - expect(glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V1_0, - type: glue.JobType.PYTHON_SHELL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE, - script, - })).toBeDefined(); - }); - - test('with JobType.PYTHON_SHELL and a language other than JobLanguage.PYTHON should throw', () => { - expect(() => glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V3_0, - type: glue.JobType.PYTHON_SHELL, - language: glue.JobLanguage.SCALA, - script, - })).toThrow(/Python shell requires the language to be set to Python/); - }); - - test('with JobType.of("pythonshell") and a language other than JobLanguage.PYTHON should throw', () => { - expect(() => glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V3_0, - type: glue.JobType.of('pythonshell'), - language: glue.JobLanguage.SCALA, - script, - })).toThrow(/Python shell requires the language to be set to Python/); - }); - - test('with JobType.of("glueray") and a language other than JobLanguage.PYTHON should throw', () => { - expect(() => glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V4_0, - type: glue.JobType.of('glueray'), - language: glue.JobLanguage.SCALA, - script, - })).toThrow(/Ray requires the language to be set to Python/); - }); - - test('with JobType.RAY and a language other than JobLanguage.PYTHON should throw', () => { - expect(() => glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V4_0, - type: glue.JobType.RAY, - language: glue.JobLanguage.SCALA, - script, - })).toThrow(/Ray requires the language to be set to Python/); - }); - - test('with a non JobLanguage.PYTHON and extraPythonFiles set should throw', () => { - expect(() => glue.JobExecutable.of({ - glueVersion: glue.GlueVersion.V3_0, - type: glue.JobType.ETL, - language: glue.JobLanguage.SCALA, - className: 'com.Test', - extraPythonFiles: [script], - script, - })).toThrow(/extraPythonFiles is not supported for languages other than JobLanguage.PYTHON/); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V4_0].forEach((glueVersion) => { - test(`with JobType.PYTHON_SHELL and GlueVersion ${glueVersion} should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.PYTHON_SHELL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion, - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support Python Shell`); - }); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V4_0].forEach((glueVersion) => { - test(`with JobType.PYTHON_SHELL and GlueVersion.of("${glueVersion.name}") should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.PYTHON_SHELL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion: glue.GlueVersion.of(glueVersion.name), - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support Python Shell`); - }); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V1_0, glue.GlueVersion.V2_0, glue.GlueVersion.V3_0].forEach((glueVersion) => { - test(`with JobType.RAY and GlueVersion ${glueVersion} should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.RAY, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion, - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support Ray`); - }); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V1_0, glue.GlueVersion.V2_0, glue.GlueVersion.V3_0].forEach((glueVersion) => { - test(`with JobType.of("glueray") and GlueVersion ${glueVersion} should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.of('glueray'), - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion, - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support Ray`); - }); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V1_0].forEach((glueVersion) => { - test(`with extraJarsFirst set and GlueVersion ${glueVersion.name} should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.ETL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - extraJarsFirst: true, - script, - glueVersion, - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support extraJarsFirst`); - }); - }); - - [glue.GlueVersion.V0_9, glue.GlueVersion.V1_0].forEach((glueVersion) => { - test(`with extraJarsFirst set and GlueVersion.of("${glueVersion.name}") should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.ETL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - extraJarsFirst: true, - script, - glueVersion: glue.GlueVersion.of(glueVersion.name), - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support extraJarsFirst`); - }); - }); - - [glue.GlueVersion.V2_0, glue.GlueVersion.V3_0, glue.GlueVersion.V4_0].forEach((glueVersion) => { - test(`with PythonVersion.TWO and GlueVersion ${glueVersion} should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.ETL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion, - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support PythonVersion 2`); - }); - }); - - [glue.GlueVersion.V2_0, glue.GlueVersion.V3_0, glue.GlueVersion.V4_0].forEach((glueVersion) => { - test(`with PythonVersion.TWO and GlueVersion.of("${glueVersion.name}") should throw`, () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.ETL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.TWO, - script, - glueVersion: glue.GlueVersion.of(glueVersion.name), - })).toThrow(`Specified GlueVersion ${glueVersion.name} does not support PythonVersion 2`); - }); - }); - - test('with PythonVersion set to PythonVersion.THREE_NINE and JobType etl should throw', () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.ETL, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - glueVersion: glue.GlueVersion.V1_0, - })).toThrow('Specified PythonVersion PythonVersion.THREE_NINE is only supported for JobType Python Shell'); - }); - - test('with PythonVersion PythonVersion.THREE_NINE and JobType pythonshell should succeed', () => { - expect(glue.JobExecutable.of({ - type: glue.JobType.PYTHON_SHELL, - glueVersion: glue.GlueVersion.V1_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - })).toBeDefined(); - }); - - test('with PythonVersion PythonVersion.THREE_NINE and JobType.of("pythonshell") should succeed', () => { - expect(glue.JobExecutable.of({ - type: glue.JobType.of('pythonshell'), - glueVersion: glue.GlueVersion.V1_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - })).toBeDefined(); - }); - - test('with PythonVersion PythonVersion.THREE_NINE and JobType ray should succeed', () => { - expect(glue.JobExecutable.of({ - type: glue.JobType.RAY, - glueVersion: glue.GlueVersion.V4_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - script, - })).toBeDefined(); - }); - - test('with PythonVersion PythonVersion.THREE_NINE and JobTypeof("glueray") should succeed', () => { - expect(glue.JobExecutable.of({ - type: glue.JobType.of('glueray'), - glueVersion: glue.GlueVersion.V4_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - script, - })).toBeDefined(); - }); - - test('with JobTypeof("glueray") and extraPythonFiles set should throw', () => { - expect(() => glue.JobExecutable.of({ - type: glue.JobType.of('glueray'), - glueVersion: glue.GlueVersion.V4_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - extraPythonFiles: [script], - script, - })).toThrow(/extraPythonFiles is not supported for Ray jobs/); - }); - - test('with JobType ray and s3PythonModules should succeed', () => { - expect(glue.JobExecutable.of({ - type: glue.JobType.of('glueray'), - glueVersion: glue.GlueVersion.V4_0, - language: glue.JobLanguage.PYTHON, - pythonVersion: glue.PythonVersion.THREE_NINE, - s3PythonModules: [script], - runtime: glue.Runtime.RAY_TWO_FOUR, - script, - })).toBeDefined(); - }); - }); -}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/job-jar/helloworld.jar b/packages/@aws-cdk/aws-glue-alpha/test/job-jar/helloworld.jar new file mode 100644 index 0000000000000..41a6aa95d5aff Binary files /dev/null and b/packages/@aws-cdk/aws-glue-alpha/test/job-jar/helloworld.jar differ diff --git a/packages/@aws-cdk/aws-glue-alpha/test/job.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/job.test.ts deleted file mode 100644 index 0e6db582c1d71..0000000000000 --- a/packages/@aws-cdk/aws-glue-alpha/test/job.test.ts +++ /dev/null @@ -1,1180 +0,0 @@ -import { EOL } from 'os'; -import { Template } from 'aws-cdk-lib/assertions'; -import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'; -import * as events from 'aws-cdk-lib/aws-events'; -import * as iam from 'aws-cdk-lib/aws-iam'; -import * as logs from 'aws-cdk-lib/aws-logs'; -import * as s3 from 'aws-cdk-lib/aws-s3'; -import * as cdk from 'aws-cdk-lib'; -import * as glue from '../lib'; - -describe('WorkerType', () => { - test('.STANDARD should set the name correctly', () => expect(glue.WorkerType.STANDARD.name).toEqual('Standard')); - - test('.G_1X should set the name correctly', () => expect(glue.WorkerType.G_1X.name).toEqual('G.1X')); - - test('.G_2X should set the name correctly', () => expect(glue.WorkerType.G_2X.name).toEqual('G.2X')); - - test('.G_4X should set the name correctly', () => expect(glue.WorkerType.G_4X.name).toEqual('G.4X')); - - test('.G_8X should set the name correctly', () => expect(glue.WorkerType.G_8X.name).toEqual('G.8X')); - - test('.G_025X should set the name correctly', () => expect(glue.WorkerType.G_025X.name).toEqual('G.025X')); - - test('.Z_2X should set the name correctly', () => expect(glue.WorkerType.Z_2X.name).toEqual('Z.2X')); - - test('of(customType) should set name correctly', () => expect(glue.WorkerType.of('CustomType').name).toEqual('CustomType')); -}); - -describe('Job', () => { - const jobName = 'test-job'; - let stack: cdk.Stack; - - beforeEach(() => { - stack = new cdk.Stack(); - }); - - describe('.fromJobAttributes()', () => { - test('with required attrs only', () => { - const job = glue.Job.fromJobAttributes(stack, 'ImportedJob', { jobName }); - - expect(job.jobName).toEqual(jobName); - expect(job.jobArn).toEqual(stack.formatArn({ - service: 'glue', - resource: 'job', - resourceName: jobName, - })); - expect(job.grantPrincipal).toEqual(new iam.UnknownPrincipal({ resource: job })); - }); - - test('with all attrs', () => { - const role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); - const job = glue.Job.fromJobAttributes(stack, 'ImportedJob', { jobName, role }); - - expect(job.jobName).toEqual(jobName); - expect(job.jobArn).toEqual(stack.formatArn({ - service: 'glue', - resource: 'job', - resourceName: jobName, - })); - expect(job.grantPrincipal).toEqual(role); - }); - }); - - describe('new', () => { - const className = 'com.amazon.test.ClassName'; - const codeBucketName = 'bucketname'; - const codeBucketAccessStatement = { - Action: [ - 's3:GetObject*', - 's3:GetBucket*', - 's3:List*', - ], - Effect: 'Allow', - Resource: [ - { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - `:s3:::${codeBucketName}`, - ], - ], - }, - { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - `:s3:::${codeBucketName}/script`, - ], - ], - }, - ], - }; - let codeBucket: s3.IBucket; - let script: glue.Code; - let extraJars: glue.Code[]; - let extraFiles: glue.Code[]; - let extraPythonFiles: glue.Code[]; - let job: glue.Job; - let defaultProps: glue.JobProps; - - beforeEach(() => { - codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', codeBucketName); - script = glue.Code.fromBucket(codeBucket, 'script'); - extraJars = [glue.Code.fromBucket(codeBucket, 'file1.jar'), glue.Code.fromBucket(codeBucket, 'file2.jar')]; - extraPythonFiles = [glue.Code.fromBucket(codeBucket, 'file1.py'), glue.Code.fromBucket(codeBucket, 'file2.py')]; - extraFiles = [glue.Code.fromBucket(codeBucket, 'file1.txt'), glue.Code.fromBucket(codeBucket, 'file2.txt')]; - defaultProps = { - executable: glue.JobExecutable.scalaEtl({ - glueVersion: glue.GlueVersion.V2_0, - className, - script, - }), - }; - }); - - describe('with necessary props only', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', defaultProps); - }); - - test('should create a role and use it with the job', () => { - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Role', { - AssumeRolePolicyDocument: { - Statement: [ - { - Action: 'sts:AssumeRole', - Effect: 'Allow', - Principal: { - Service: 'glue.amazonaws.com', - }, - }, - ], - Version: '2012-10-17', - }, - ManagedPolicyArns: [ - { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - ':iam::aws:policy/service-role/AWSGlueServiceRole', - ], - ], - }, - ], - }); - - // Role policy should grant reading from the assets bucket - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Policy', { - PolicyDocument: { - Statement: [ - codeBucketAccessStatement, - ], - }, - Roles: [ - { - Ref: 'JobServiceRole4F432993', - }, - ], - }); - - // check the job using the role - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - Command: { - Name: 'glueetl', - ScriptLocation: 's3://bucketname/script', - }, - Role: { - 'Fn::GetAtt': [ - 'JobServiceRole4F432993', - 'Arn', - ], - }, - }); - }); - - test('should return correct jobName and jobArn from CloudFormation', () => { - expect(stack.resolve(job.jobName)).toEqual({ Ref: 'JobB9D00F9F' }); - expect(stack.resolve(job.jobArn)).toEqual({ - 'Fn::Join': ['', ['arn:', { Ref: 'AWS::Partition' }, ':glue:', { Ref: 'AWS::Region' }, ':', { Ref: 'AWS::AccountId' }, ':job/', { Ref: 'JobB9D00F9F' }]], - }); - }); - - test('with a custom role should use it and set it in CloudFormation', () => { - const role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); - job = new glue.Job(stack, 'JobWithRole', { - ...defaultProps, - role, - }); - - expect(job.grantPrincipal).toEqual(role); - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - Role: role.roleArn, - }); - }); - - test('with a custom jobName should set it in CloudFormation', () => { - job = new glue.Job(stack, 'JobWithName', { - ...defaultProps, - jobName, - }); - - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - Name: jobName, - }); - }); - }); - - describe('enabling continuous logging with defaults', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - ...defaultProps, - continuousLogging: { enabled: true }, - }); - }); - - test('should set minimal default arguments', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - DefaultArguments: { - '--enable-continuous-cloudwatch-log': 'true', - '--enable-continuous-log-filter': 'true', - }, - }); - }); - }); - - describe('enabling continuous logging with all props set', () => { - let logGroup; - - beforeEach(() => { - logGroup = logs.LogGroup.fromLogGroupName(stack, 'LogGroup', 'LogGroupName'); - job = new glue.Job(stack, 'Job', { - ...defaultProps, - continuousLogging: { - enabled: true, - quiet: false, - logStreamPrefix: 'LogStreamPrefix', - conversionPattern: '%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n', - logGroup, - }, - }); - }); - - test('should set all arguments', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - DefaultArguments: { - '--enable-continuous-cloudwatch-log': 'true', - '--enable-continuous-log-filter': 'false', - '--continuous-log-logGroup': 'LogGroupName', - '--continuous-log-logStreamPrefix': 'LogStreamPrefix', - '--continuous-log-conversionPattern': '%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n', - }, - }); - }); - - test('should grant cloudwatch log write permissions', () => { - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Policy', { - PolicyDocument: { - Statement: [ - { - Action: [ - 'logs:CreateLogStream', - 'logs:PutLogEvents', - ], - Effect: 'Allow', - Resource: { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - ':logs:', - { - Ref: 'AWS::Region', - }, - ':', - { - Ref: 'AWS::AccountId', - }, - ':log-group:LogGroupName:*', - ], - ], - }, - }, - codeBucketAccessStatement, - ], - }, - Roles: [ - { - Ref: 'JobServiceRole4F432993', - }, - ], - }); - }); - }); - - describe('enabling execution class', () => { - describe('enabling execution class with FLEX', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - }); - }); - - test('should set FLEX', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - ExecutionClass: 'FLEX', - }); - }); - }); - - describe('enabling execution class with FLEX and WorkerType G_1X', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - workerType: glue.WorkerType.G_1X, - workerCount: 10, - }); - }); - - test('should set FLEX', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - ExecutionClass: 'FLEX', - WorkerType: 'G.1X', - }); - }); - }); - - describe('enabling execution class with FLEX and WorkerType G_2X', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - workerType: glue.WorkerType.G_2X, - workerCount: 10, - }); - }); - - test('should set FLEX', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - ExecutionClass: 'FLEX', - WorkerType: 'G.2X', - }); - }); - }); - - describe('enabling execution class with STANDARD', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.STANDARD, - }); - }); - - test('should set STANDARD', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - ExecutionClass: 'STANDARD', - }); - }); - }); - - describe('errors for execution class with FLEX', () => { - test('job type except JobType.ETL should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for JobType.ETL jobs'); - }); - - test('with glue version 0.9 should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V0_9, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for GlueVersion 3.0 or later'); - }); - - test('with glue version 1.0 should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for GlueVersion 3.0 or later'); - }); - - test('with glue version 2.0 should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for GlueVersion 3.0 or later'); - }); - - test('with G_025X as worker type that is neither G_1X nor G_2X should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - workerType: glue.WorkerType.G_025X, - workerCount: 2, - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for WorkerType G_1X or G_2X'); - }); - - test('with G_4X as worker type that is neither G_1X nor G_2X should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - workerType: glue.WorkerType.G_4X, - workerCount: 10, - executionClass: glue.ExecutionClass.FLEX, - })).toThrow('FLEX ExecutionClass is only available for WorkerType G_1X or G_2X'); - }); - }); - }); - - describe('enabling spark ui', () => { - describe('with no bucket or path provided', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - ...defaultProps, - sparkUI: { enabled: true }, - }); - }); - - test('should create spark ui bucket', () => { - Template.fromStack(stack).resourceCountIs('AWS::S3::Bucket', 1); - }); - - test('should grant the role read/write permissions to the spark ui bucket', () => { - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Policy', { - PolicyDocument: { - Statement: [ - { - Action: [ - 's3:GetObject*', - 's3:GetBucket*', - 's3:List*', - 's3:DeleteObject*', - 's3:PutObject', - 's3:PutObjectLegalHold', - 's3:PutObjectRetention', - 's3:PutObjectTagging', - 's3:PutObjectVersionTagging', - 's3:Abort*', - ], - Effect: 'Allow', - Resource: [ - { - 'Fn::GetAtt': [ - 'JobSparkUIBucket8E6A0139', - 'Arn', - ], - }, - { - 'Fn::Join': [ - '', - [ - { - 'Fn::GetAtt': [ - 'JobSparkUIBucket8E6A0139', - 'Arn', - ], - }, - '/*', - ], - ], - }, - ], - }, - codeBucketAccessStatement, - ], - Version: '2012-10-17', - }, - PolicyName: 'JobServiceRoleDefaultPolicy03F68F9D', - Roles: [ - { - Ref: 'JobServiceRole4F432993', - }, - ], - }); - }); - - test('should set spark arguments on the job', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - DefaultArguments: { - '--enable-spark-ui': 'true', - '--spark-event-logs-path': { - 'Fn::Join': [ - '', - [ - 's3://', - { - Ref: 'JobSparkUIBucket8E6A0139', - }, - '/', - ], - ], - }, - }, - }); - }); - }); - - describe('with bucket provided', () => { - const sparkUIBucketName = 'sparkbucketname'; - let sparkUIBucket: s3.IBucket; - - beforeEach(() => { - sparkUIBucket = s3.Bucket.fromBucketName(stack, 'SparkBucketId', sparkUIBucketName); - job = new glue.Job(stack, 'Job', { - ...defaultProps, - sparkUI: { - enabled: true, - bucket: sparkUIBucket, - }, - }); - }); - - test('should grant the role read/write permissions to the provided spark ui bucket', () => { - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Policy', { - PolicyDocument: { - Statement: [ - { - Action: [ - 's3:GetObject*', - 's3:GetBucket*', - 's3:List*', - 's3:DeleteObject*', - 's3:PutObject', - 's3:PutObjectLegalHold', - 's3:PutObjectRetention', - 's3:PutObjectTagging', - 's3:PutObjectVersionTagging', - 's3:Abort*', - ], - Effect: 'Allow', - Resource: [ - { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - ':s3:::sparkbucketname', - ], - ], - }, - { - 'Fn::Join': [ - '', - [ - 'arn:', - { - Ref: 'AWS::Partition', - }, - ':s3:::sparkbucketname/*', - ], - ], - }, - ], - }, - codeBucketAccessStatement, - ], - }, - Roles: [ - { - Ref: 'JobServiceRole4F432993', - }, - ], - }); - }); - - test('should set spark arguments on the job', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - DefaultArguments: { - '--enable-spark-ui': 'true', - '--spark-event-logs-path': `s3://${sparkUIBucketName}/`, - }, - }); - }); - }); - describe('with bucket and path provided', () => { - const sparkUIBucketName = 'sparkbucketname'; - const prefix = 'foob/bart/'; - const badPrefix = '/foob/bart'; - let sparkUIBucket: s3.IBucket; - - const expectedErrors = [ - `Invalid prefix format (value: ${badPrefix})`, - 'Prefix must not begin with \'/\'', - 'Prefix must end with \'/\'', - ].join(EOL); - it('fails if path is mis-formatted', () => { - expect(() => new glue.Job(stack, 'BadPrefixJob', { - ...defaultProps, - sparkUI: { - enabled: true, - bucket: sparkUIBucket, - prefix: badPrefix, - }, - })).toThrow(expectedErrors); - }); - - beforeEach(() => { - sparkUIBucket = s3.Bucket.fromBucketName(stack, 'BucketId', sparkUIBucketName); - job = new glue.Job(stack, 'Job', { - ...defaultProps, - sparkUI: { - enabled: true, - bucket: sparkUIBucket, - prefix: prefix, - }, - }); - }); - - it('should grant the role read/write permissions spark ui bucket prefixed folder', () => { - Template.fromStack(stack).hasResourceProperties('AWS::IAM::Policy', { - PolicyDocument: { - Statement: [ - { - Action: [ - 's3:GetObject*', - 's3:GetBucket*', - 's3:List*', - 's3:DeleteObject*', - 's3:PutObject', - 's3:PutObjectLegalHold', - 's3:PutObjectRetention', - 's3:PutObjectTagging', - 's3:PutObjectVersionTagging', - 's3:Abort*', - ], - Effect: 'Allow', - Resource: [ - { - 'Fn::Join': [ - '', - [ - 'arn:', - { Ref: 'AWS::Partition' }, - ':s3:::sparkbucketname', - ], - ], - }, - { - 'Fn::Join': [ - '', - [ - 'arn:', - { Ref: 'AWS::Partition' }, - `:s3:::sparkbucketname/${prefix}*`, - ], - ], - }, - ], - }, - codeBucketAccessStatement, - ], - Version: '2012-10-17', - }, - PolicyName: 'JobServiceRoleDefaultPolicy03F68F9D', - Roles: [{ Ref: 'JobServiceRole4F432993' }], - }); - }); - - it('should set spark arguments on the job', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - DefaultArguments: { - '--enable-spark-ui': 'true', - '--spark-event-logs-path': `s3://${sparkUIBucketName}/${prefix}`, - }, - }); - }); - }); - }); - - describe('with extended props', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - ...defaultProps, - jobName, - description: 'test job', - workerType: glue.WorkerType.G_2X, - workerCount: 10, - maxConcurrentRuns: 2, - maxRetries: 2, - timeout: cdk.Duration.minutes(5), - notifyDelayAfter: cdk.Duration.minutes(1), - defaultArguments: { - arg1: 'value1', - arg2: 'value2', - }, - connections: [glue.Connection.fromConnectionName(stack, 'ImportedConnection', 'ConnectionName')], - securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'ImportedSecurityConfiguration', 'SecurityConfigurationName'), - enableProfilingMetrics: true, - tags: { - key: 'value', - }, - }); - }); - - test('should synthesize correctly', () => { - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - Command: { - Name: 'glueetl', - ScriptLocation: 's3://bucketname/script', - }, - Role: { - 'Fn::GetAtt': [ - 'JobServiceRole4F432993', - 'Arn', - ], - }, - DefaultArguments: { - '--job-language': 'scala', - '--class': 'com.amazon.test.ClassName', - '--enable-metrics': '', - 'arg1': 'value1', - 'arg2': 'value2', - }, - Description: 'test job', - ExecutionProperty: { - MaxConcurrentRuns: 2, - }, - GlueVersion: '2.0', - MaxRetries: 2, - Name: 'test-job', - NotificationProperty: { - NotifyDelayAfter: 1, - }, - NumberOfWorkers: 10, - Tags: { - key: 'value', - }, - Timeout: 5, - WorkerType: 'G.2X', - Connections: { - Connections: [ - 'ConnectionName', - ], - }, - SecurityConfiguration: 'SecurityConfigurationName', - }); - }); - }); - - test('with reserved args should throw', () => { - ['--debug', '--mode', '--JOB_NAME'].forEach((arg, index) => { - const defaultArguments: {[key: string]: string} = {}; - defaultArguments[arg] = 'random value'; - - expect(() => new glue.Job(stack, `Job${index}`, { - executable: glue.JobExecutable.scalaEtl({ - glueVersion: glue.GlueVersion.V2_0, - className, - script, - }), - defaultArguments, - })).toThrow(/argument is reserved by Glue/); - }); - }); - - describe('shell job', () => { - test('with unsupported glue version should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V0_9, - pythonVersion: glue.PythonVersion.TWO, - script, - }), - })).toThrow('Specified GlueVersion 0.9 does not support Python Shell'); - }); - - test('with unsupported Spark UI prop should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - sparkUI: { enabled: true }, - })).toThrow('Spark UI is not available for JobType.PYTHON_SHELL'); - }); - }); - - describe('ray job', () => { - test('with unsupported glue version should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonRay({ - glueVersion: glue.GlueVersion.V3_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - script, - }), - workerType: glue.WorkerType.Z_2X, - workerCount: 2, - })).toThrow('Specified GlueVersion 3.0 does not support Ray'); - }); - - test('with unsupported Spark UI prop should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonRay({ - glueVersion: glue.GlueVersion.V4_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - runtime: glue.Runtime.RAY_TWO_FOUR, - script, - }), - workerType: glue.WorkerType.Z_2X, - workerCount: 2, - sparkUI: { enabled: true }, - })).toThrow('Spark UI is not available for JobType.RAY'); - }); - - test('without runtime should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonRay({ - glueVersion: glue.GlueVersion.V4_0, - pythonVersion: glue.PythonVersion.THREE_NINE, - script, - }), - workerType: glue.WorkerType.Z_2X, - workerCount: 2, - })).toThrow('Runtime is required for Ray jobs'); - }); - }); - - test('etl job with all props should synthesize correctly', () => { - new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - extraJarsFirst: true, - script, - extraPythonFiles, - extraJars, - extraFiles, - }), - }); - - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - GlueVersion: '2.0', - Command: { - Name: 'glueetl', - ScriptLocation: 's3://bucketname/script', - PythonVersion: '3', - }, - Role: { - 'Fn::GetAtt': [ - 'JobServiceRole4F432993', - 'Arn', - ], - }, - DefaultArguments: { - '--job-language': 'python', - '--extra-jars': 's3://bucketname/file1.jar,s3://bucketname/file2.jar', - '--extra-py-files': 's3://bucketname/file1.py,s3://bucketname/file2.py', - '--extra-files': 's3://bucketname/file1.txt,s3://bucketname/file2.txt', - '--user-jars-first': 'true', - }, - }); - }); - - test('streaming job with all props should synthesize correctly', () => { - new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.scalaStreaming({ - glueVersion: glue.GlueVersion.V2_0, - extraJarsFirst: true, - className, - script, - extraJars, - extraFiles, - }), - }); - - Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { - GlueVersion: '2.0', - Command: { - Name: 'gluestreaming', - ScriptLocation: 's3://bucketname/script', - }, - Role: { - 'Fn::GetAtt': [ - 'JobServiceRole4F432993', - 'Arn', - ], - }, - DefaultArguments: { - '--job-language': 'scala', - '--class': 'com.amazon.test.ClassName', - '--extra-jars': 's3://bucketname/file1.jar,s3://bucketname/file2.jar', - '--extra-files': 's3://bucketname/file1.txt,s3://bucketname/file2.txt', - '--user-jars-first': 'true', - }, - }); - }); - - describe('event rules and rule-based metrics', () => { - beforeEach(() => { - job = new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.scalaEtl({ - glueVersion: glue.GlueVersion.V2_0, - className, - script, - }), - }); - }); - - test('.onEvent() should create the expected event rule', () => { - job.onEvent('eventId', {}); - - Template.fromStack(stack).hasResourceProperties('AWS::Events::Rule', { - EventPattern: { - 'source': [ - 'aws.glue', - ], - 'detail-type': [ - 'Glue Job State Change', - 'Glue Job Run Status', - ], - 'detail': { - jobName: [ - { - Ref: 'JobB9D00F9F', - }, - ], - }, - }, - State: 'ENABLED', - }); - }); - - [ - { name: 'onSuccess()', invoke: (testJob: glue.Job) => testJob.onSuccess('SuccessRule'), state: 'SUCCEEDED' }, - { name: 'onFailure()', invoke: (testJob: glue.Job) => testJob.onFailure('FailureRule'), state: 'FAILED' }, - { name: 'onTimeout()', invoke: (testJob: glue.Job) => testJob.onTimeout('TimeoutRule'), state: 'TIMEOUT' }, - ].forEach((testCase) => { - test(`${testCase.name} should create a rule with correct properties`, () => { - testCase.invoke(job); - - Template.fromStack(stack).hasResourceProperties('AWS::Events::Rule', { - Description: { - 'Fn::Join': [ - '', - [ - 'Rule triggered when Glue job ', - { - Ref: 'JobB9D00F9F', - }, - ` is in ${testCase.state} state`, - ], - ], - }, - EventPattern: { - 'source': [ - 'aws.glue', - ], - 'detail-type': [ - 'Glue Job State Change', - 'Glue Job Run Status', - ], - 'detail': { - state: [ - testCase.state, - ], - jobName: [ - { - Ref: 'JobB9D00F9F', - }, - ], - }, - }, - State: 'ENABLED', - }); - }); - }); - - [ - { name: '.metricSuccess()', invoke: (testJob: glue.Job) => testJob.metricSuccess(), state: 'SUCCEEDED', ruleId: 'SuccessMetricRule' }, - { name: '.metricFailure()', invoke: (testJob: glue.Job) => testJob.metricFailure(), state: 'FAILED', ruleId: 'FailureMetricRule' }, - { name: '.metricTimeout()', invoke: (testJob: glue.Job) => testJob.metricTimeout(), state: 'TIMEOUT', ruleId: 'TimeoutMetricRule' }, - ].forEach((testCase) => { - test(`${testCase.name} should create the expected singleton event rule and corresponding metric`, () => { - const metric = testCase.invoke(job); - testCase.invoke(job); - - expect(metric).toEqual(new cloudwatch.Metric({ - dimensionsMap: { - RuleName: (job.node.findChild(testCase.ruleId) as events.Rule).ruleName, - }, - metricName: 'TriggeredRules', - namespace: 'AWS/Events', - statistic: 'Sum', - })); - - Template.fromStack(stack).resourceCountIs('AWS::Events::Rule', 1); - Template.fromStack(stack).hasResourceProperties('AWS::Events::Rule', { - Description: { - 'Fn::Join': [ - '', - [ - 'Rule triggered when Glue job ', - { - Ref: 'JobB9D00F9F', - }, - ` is in ${testCase.state} state`, - ], - ], - }, - EventPattern: { - 'source': [ - 'aws.glue', - ], - 'detail-type': [ - 'Glue Job State Change', - 'Glue Job Run Status', - ], - 'detail': { - state: [ - testCase.state, - ], - jobName: [ - { - Ref: 'JobB9D00F9F', - }, - ], - }, - }, - State: 'ENABLED', - }); - }); - }); - }); - - describe('.metric()', () => { - - test('with MetricType.COUNT should create a count sum metric', () => { - const metricName = 'glue.driver.aggregate.bytesRead'; - const props = { statistic: cloudwatch.Statistic.SUM }; - - expect(job.metric(metricName, glue.MetricType.COUNT, props)).toEqual(new cloudwatch.Metric({ - metricName, - statistic: 'Sum', - namespace: 'Glue', - dimensionsMap: { - JobName: job.jobName, - JobRunId: 'ALL', - Type: 'count', - }, - })); - }); - - test('with MetricType.GAUGE should create a gauge average metric', () => { - const metricName = 'glue.driver.BlockManager.disk.diskSpaceUsed_MB'; - const props = { statistic: cloudwatch.Statistic.AVERAGE }; - - expect(job.metric(metricName, glue.MetricType.GAUGE, props)).toEqual(new cloudwatch.Metric({ - metricName, - statistic: 'Average', - namespace: 'Glue', - dimensionsMap: { - JobName: job.jobName, - JobRunId: 'ALL', - Type: 'gauge', - }, - })); - }); - }); - - describe('validation for maxCapacity and workerType', () => { - test('maxCapacity with workerType and workerCount should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V1_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - maxCapacity: 10, - workerType: glue.WorkerType.G_1X, - workerCount: 10, - })).toThrow('maxCapacity cannot be used when setting workerType and workerCount'); - }); - - test('maxCapacity with GlueVersion 2.0 or later should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - maxCapacity: 10, - })).toThrow('maxCapacity cannot be used when GlueVersion 2.0 or later'); - }); - - test('maxCapacity with Python Shell jobs validation', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonShell({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - maxCapacity: 10, - })).toThrow(/maxCapacity value must be either 0.0625 or 1 for JobType.PYTHON_SHELL jobs/); - }); - - test('workerType without workerCount should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - workerType: glue.WorkerType.G_1X, - })).toThrow('Both workerType and workerCount must be set'); - }); - - test('workerCount without workerType should throw', () => { - expect(() => new glue.Job(stack, 'Job', { - executable: glue.JobExecutable.pythonEtl({ - glueVersion: glue.GlueVersion.V2_0, - pythonVersion: glue.PythonVersion.THREE, - script, - }), - workerCount: 10, - })).toThrow('Both workerType and workerCount must be set'); - }); - }); - }); -}); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/pyspark-etl-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-etl-jobs.test.ts new file mode 100644 index 0000000000000..dad55249bc291 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-etl-jobs.test.ts @@ -0,0 +1,675 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + let sparkUIBucket: s3.Bucket; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + }); + + describe('Create new PySpark ETL Job with default parameters', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('Default Python version should be 3.9', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.ETL, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE_NINE, + }, + }); + }); + }); + + describe('Create new PySpark ETL Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + + }); + + describe('Create new PySpark ETL Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJob', + role, + script, + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }, + }); + }); + + }); + + describe('Create PySpark ETL Job with G2 worker type with 2 workers', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + workerType: glue.WorkerType.G_2X, + numberOfWorkers: 2, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.2X', + }); + }); + }); + + describe('Create PySpark ETL Job with G4 worker type with 4 workers', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + workerType: glue.WorkerType.G_4X, + numberOfWorkers: 4, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 4, + }); + }); + + test('Overriden WorkerType should be G.4X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.4X', + }); + }); + }); + + describe('Create PySpark ETL Job with G8 worker type and 8 workers', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + workerType: glue.WorkerType.G_8X, + numberOfWorkers: 8, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 8', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 8, + }); + }); + + test('Overriden WorkerType should be G.8X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.8X', + }); + }); + }); + + describe('Override SparkUI properties for PySpark ETL Job', () => { + + beforeEach(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + sparkUI: { + bucket: sparkUIBucket, + prefix: '/prefix', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Has Continuous Logging and SparkUIEnabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + '--enable-spark-ui': 'true', + '--spark-event-logs-path': Match.objectLike({ + 'Fn::Join': [ + '', + [ + 's3://', + { Ref: Match.anyValue() }, + '/prefix/', + ], + ], + }), + }), + }); + }); + }); + + describe('Invalid overrides should cause errors', () => { + + test('Invalid SparkUI prefix should throw an error', () => { + expect(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + sparkUI: { + bucket: sparkUIBucket, + prefix: 'prefix', + }, + numberOfWorkers: 8, + workerType: glue.WorkerType.G_8X, + continuousLogging: { enabled: false }, + }); + }).toThrow('Invalid prefix format (value: prefix)'); + }); + + }); + + describe('Create PySpark ETL Job with extraPythonFiles and extraFiles', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + role, + script, + jobName: 'PySparkETLJob', + extraPythonFiles: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraPythonFilesBucket', 'extra-python-files-bucket'), + 'prefix/file.py'), + ], + extraFiles: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraFilesBucket', 'extra-files-bucket'), + 'prefix/file.txt'), + ], + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + '--extra-py-files': 's3://extra-python-files-bucket/prefix/file.py', + '--extra-files': 's3://extra-files-bucket/prefix/file.txt', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + }); + + describe('Create PySpark ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PySparkETLJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create PySpark ETL Job with overridden job run queueing', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + jobRunQueuingEnabled: true, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PySparkETLJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/pyspark-flex-etl-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-flex-etl-jobs.test.ts new file mode 100644 index 0000000000000..1ff64b215cca0 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-flex-etl-jobs.test.ts @@ -0,0 +1,257 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + }); + + describe('Create new PySpark ETL Flex Job with default parameters', () => { + + beforeEach(() => { + job = new glue.PySparkFlexEtlJob(stack, 'ImportedJob', { role, script }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('ExecutionClass should be Flex', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionClass: 'FLEX', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Job run queuing must be disabled for flex jobs', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + + }); + + describe('Create new PySpark ETL Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'PySparkETLJob', { + jobName: 'PySparkETLJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new PySpark ETL Flex Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.PySparkFlexEtlJob(stack, 'PySparkFlexETLJob', { + jobName: 'PySparkFlexETLJob', + role, + script, + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }, + }); + }); + + }); + + describe('Create pySpark ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.PySparkEtlJob(stack, 'pySparkEtlJob', { + jobName: 'pySparkEtlJob', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'pySparkEtlJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/pyspark-streaming-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-streaming-jobs.test.ts new file mode 100644 index 0000000000000..ce22721af335d --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/pyspark-streaming-jobs.test.ts @@ -0,0 +1,706 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + let sparkUIBucket: s3.Bucket; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + }); + + describe('Create new PySpark Streaming Job with default parameters', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'ImportedJob', { role, script }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default Python version should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.STREAMING, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE, + }, + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + }); + + describe('Create new PySpark Streaming Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + jobName: 'PySparkStreamingJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new PySpark Streaming Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + jobName: 'PySparkStreamingJob', + role, + script, + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }, + }); + }); + + }); + + describe('Create PySpark Streaming Job with G2 worker type with 2 workers', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + workerType: glue.WorkerType.G_2X, + numberOfWorkers: 2, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.2X', + }); + }); + + test('Default Python version should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.STREAMING, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE, + }, + }); + }); + }); + + describe('Create PySpark Streaming Job with G4 worker type with 4 workers', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + workerType: glue.WorkerType.G_4X, + numberOfWorkers: 4, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 4, + }); + }); + + test('Overriden WorkerType should be G.4X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.4X', + }); + }); + }); + + describe('Create PySpark Streaming Job with G8 worker type and 8 workers', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + workerType: glue.WorkerType.G_8X, + numberOfWorkers: 8, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Overriden numberOfWorkers should be 8', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 8, + }); + }); + + test('Overriden WorkerType should be G.8X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.8X', + }); + }); + }); + + describe('Override SparkUI properties for PySpark Streaming Job', () => { + + beforeEach(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + sparkUI: { + bucket: sparkUIBucket, + prefix: '/prefix', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Has Continuous Logging and SparkUIEnabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + '--enable-spark-ui': 'true', + '--spark-event-logs-path': Match.objectLike({ + 'Fn::Join': [ + '', + [ + 's3://', + { Ref: Match.anyValue() }, + '/prefix/', + ], + ], + }), + }), + }); + }); + }); + + describe('Invalid overrides should cause errors', () => { + + test('Invalid SparkUI prefix should throw an error', () => { + expect(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + sparkUI: { + bucket: sparkUIBucket, + prefix: 'prefix', + }, + numberOfWorkers: 8, + workerType: glue.WorkerType.G_8X, + continuousLogging: { enabled: false }, + }); + }).toThrow('Invalid prefix format (value: prefix)'); + }); + + }); + + describe('Create PySpark Streaming Job with extraPythonFiles and extraFiles', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + role, + script, + jobName: 'PySparkStreamingJob', + extraPythonFiles: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraPythonFilesBucket', 'extra-python-files-bucket'), + 'prefix/file.py'), + ], + extraFiles: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraFilesBucket', 'extra-files-bucket'), + 'prefix/file.txt'), + ], + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + '--enable-continuous-cloudwatch-log': 'true', + '--extra-py-files': 's3://extra-python-files-bucket/prefix/file.py', + '--extra-files': 's3://extra-files-bucket/prefix/file.txt', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + }); + + describe('Create PySpark Streaming Job with optional properties', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + jobName: 'PySparkStreamingJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PySparkStreamingJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + + test('Default Python version should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.STREAMING, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE, + }, + }); + }); + }); + + describe('Create PySpark Streaming Job with job run queuing enabled', () => { + + beforeEach(() => { + job = new glue.PySparkStreamingJob(stack, 'PySparkStreamingJob', { + jobName: 'PySparkStreamingJobCustomName', + description: 'This is a description', + role, + script, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + jobRunQueuingEnabled: true, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PySparkStreamingJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + + test('Default Python version should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.STREAMING, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE, + }, + }); + }); + }); +}); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/python-shell-job.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/python-shell-job.test.ts new file mode 100644 index 0000000000000..5f34409b28ee9 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/python-shell-job.test.ts @@ -0,0 +1,433 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + }); + + describe('Create new Python Shell Job with default parameters', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'ImportedJob', { role, script }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Default Max Retries should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + + test('Default Max Capacity should be 0.0625', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxCapacity: 0.0625, + }); + }); + + test('Default Python version should be 3.9', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.PYTHON_SHELL, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.THREE_NINE, + }, + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--enable-continuous-cloudwatch-log': 'true', + '--job-language': 'python', + 'library-set': 'analytics', + }), + }); + }); + + }); + + describe('Create new Python Shell Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'PythonShellJob', { + jobName: 'PythonShellJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + '--job-language': 'python', + }), + }); + }); + + }); + + describe('Create new Python Shell Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'PythonShellJob', { + jobName: 'PythonShellJob', + role, + script, + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }, + }); + }); + + }); + + describe('Create Python Shell Job with overridden Python verion and max capacity', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'PythonShellJob', { + role, + script, + jobName: 'PythonShellJob', + pythonVersion: glue.PythonVersion.TWO, + maxCapacity: glue.MaxCapacity.DPU_1, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Overridden Python version should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.PYTHON_SHELL, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.TWO, + }, + }); + }); + + test('Overridden Max Capacity should be 1', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxCapacity: 1, + }); + }); + + }); + + describe('Create Python Shell Job with optional properties', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'PythonShellJob', { + jobName: 'PythonShellJobCustomName', + description: 'This is a description', + pythonVersion: glue.PythonVersion.TWO, + maxCapacity: glue.MaxCapacity.DPU_1, + role, + script, + glueVersion: glue.GlueVersion.V2_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PythonShellJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 2.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '2.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + + test('Overridden Python version should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.PYTHON_SHELL, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.TWO, + }, + }); + }); + + test('Overridden Max Capacity should be 1', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxCapacity: 1, + }); + }); + }); + + describe('Create Python Shell Job with job run queuing enabled', () => { + + beforeEach(() => { + job = new glue.PythonShellJob(stack, 'PythonShellJob', { + jobName: 'PythonShellJobCustomName', + description: 'This is a description', + pythonVersion: glue.PythonVersion.TWO, + maxCapacity: glue.MaxCapacity.DPU_1, + role, + script, + glueVersion: glue.GlueVersion.V2_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + jobRunQueuingEnabled: true, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'PythonShellJobCustomName', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 2.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '2.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'python', + }), + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + + test('Overridden Python version should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Command: { + Name: glue.JobType.PYTHON_SHELL, + ScriptLocation: 's3://bucketname/script', + PythonVersion: glue.PythonVersion.TWO, + }, + }); + }); + + test('Overridden Max Capacity should be 1', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxCapacity: 1, + }); + }); + }); +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/ray-job.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/ray-job.test.ts new file mode 100644 index 0000000000000..8b5c5032eed2a --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/ray-job.test.ts @@ -0,0 +1,398 @@ + +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + }); + + describe('Create new Ray Job with default parameters', () => { + + beforeEach(() => { + job = new glue.RayJob(stack, 'ImportedJob', { role, script }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Default number of workers should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 3, + }); + }); + + test('Default worker type should be Z.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'Z.2X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + + }); + + describe('Create new Ray Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.RayJob(stack, 'RayJob', { + jobName: 'RayJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new Ray Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.RayJob(stack, 'RayJob', { + jobName: 'RayJob', + role, + script, + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + }, + }); + }); + + }); + + describe('Create Ray Job with optional override parameters', () => { + + beforeEach(() => { + job = new glue.RayJob(stack, 'ImportedJob', { + role, + script, + jobName: 'RayCustomJobName', + description: 'This is a description', + workerType: glue.WorkerType.Z_2X, + numberOfWorkers: 5, + runtime: glue.Runtime.RAY_TWO_FOUR, + maxRetries: 3, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Cannot override Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Overridden number of workers should be 5', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 5, + }); + }); + + test('Cannot override worker type should be Z.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'Z.2X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'RayCustomJobName', + Description: 'This is a description', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + }), + }); + }); + + test('Overriden max retries should be 3', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 3, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create Ray Job with job run queuing enabled', () => { + + beforeEach(() => { + job = new glue.RayJob(stack, 'ImportedJob', { + role, + script, + jobName: 'RayCustomJobName', + description: 'This is a description', + workerType: glue.WorkerType.Z_2X, + numberOfWorkers: 5, + runtime: glue.Runtime.RAY_TWO_FOUR, + maxRetries: 3, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + jobRunQueuingEnabled: true, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Cannot override Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Overridden number of workers should be 5', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 5, + }); + }); + + test('Cannot override worker type should be Z.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'Z.2X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'RayCustomJobName', + Description: 'This is a description', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + }), + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Invalid overrides should cause errors', () => { + + test('Create Ray Job overriding only workerType to cause an Error', () => { + expect(() => { + job = new glue.RayJob(stack, 'RayJob', { + role, + script, + workerType: glue.WorkerType.G_025X, + }); + }).toThrow(new Error('Ray jobs only support Z.2X worker type')); + }); + }); +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-etl-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-etl-jobs.test.ts new file mode 100644 index 0000000000000..36de605887f0f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-etl-jobs.test.ts @@ -0,0 +1,493 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + let className: string; + let sparkUIBucket: s3.Bucket; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + className = 'com.example.HelloWorld'; + }); + + describe('Create new Scala Spark ETL Job with default parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ImportedJob', { role, script, className }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + }); + + describe('Create new Scala ETL Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + jobName: 'ScalaSparkEtlJob', + role, + script, + className: 'com.example.HelloWorld', + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new Scala ETL Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + jobName: 'ScalaSparkEtlJob', + role, + script, + className: 'com.example.HelloWorld', + continuousLogging: { + enabled: false, + }, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }, + }); + }); + + }); + + describe('Create ScalaSpark ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + jobName: 'ScalaSparkEtlJob', + description: 'This is a description', + role, + script, + className, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'ScalaSparkEtlJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create ScalaSpark ETL Job with job run queuing enabled', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + jobName: 'ScalaSparkEtlJob', + description: 'This is a description', + role, + script, + className, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + jobRunQueuingEnabled: true, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'ScalaSparkEtlJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create ScalaSpark ETL Job with extraJars', () => { + + beforeEach(() => { + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + role, + script, + jobName: 'ScalaSparkEtlJob', + className, + extraJars: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraJarsBucket', 'extra-jars-bucket'), + 'prefix/file.jar'), + ], + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + '--extra-jars': 's3://extra-jars-bucket/prefix/file.jar', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + }); + + describe('Override SparkUI properties for ScalaSpark ETL Job', () => { + + beforeEach(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.ScalaSparkEtlJob(stack, 'ScalaSparkEtlJob', { + role, + script, + jobName: 'ScalaSparkEtlJob', + className, + sparkUI: { + bucket: sparkUIBucket, + prefix: '/prefix', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Has Continuous Logging and SparkUIEnabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + '--enable-spark-ui': 'true', + '--spark-event-logs-path': Match.objectLike({ + 'Fn::Join': [ + '', + [ + 's3://', + { Ref: Match.anyValue() }, + '/prefix/', + ], + ], + }), + }), + }); + }); + }); +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-flex-etl-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-flex-etl-jobs.test.ts new file mode 100644 index 0000000000000..b7c3a5cfab395 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-flex-etl-jobs.test.ts @@ -0,0 +1,376 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + let className: string; + let sparkUIBucket: s3.Bucket; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + className = 'com.example.HelloWorld'; + }); + + describe('Create new Scala Spark ETL Flex Job with default parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkFlexEtlJob(stack, 'ImportedJob', { role, script, className }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('ExecutionClass should be Flex', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionClass: 'FLEX', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Job run queuing must be disabled for flex jobs', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + }); + + describe('Create new ScalaSpark ETL Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkFlexEtlJob(stack, 'ScalaSparkFlexETLJob', { + jobName: 'ScalaSparkFlexETLJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + className, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new ScalaSpark ETL Flex Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.ScalaSparkFlexEtlJob(stack, 'ScalaSparkFlexETLJob', { + jobName: 'ScalaSparkFlexETLJob', + role, + script, + continuousLogging: { + enabled: false, + }, + className, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }, + }); + }); + }); + + describe('Create ScalaSpark Flex ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.ScalaSparkFlexEtlJob(stack, 'ScalaSparkFlexEtlJob', { + jobName: 'ScalaSparkFlexEtlJob', + description: 'This is a description', + role, + script, + className, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'ScalaSparkFlexEtlJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create ScalaSpark Flex ETL Job with extraJars and extraFiles', () => { + + beforeEach(() => { + job = new glue.ScalaSparkFlexEtlJob(stack, 'ScalaSparkFlexEtlJob', { + role, + script, + jobName: 'ScalaSparkFlexEtlJob', + className, + extraJars: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraJarsBucket', 'extra-jars-bucket'), + 'prefix/file.jar'), + ], + extraFiles: [ + glue.Code.fromBucket( + s3.Bucket.fromBucketName(stack, 'extraFilesBucket', 'extra-files-bucket'), + 'prefix/file.txt'), + ], + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V3_0, + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + '--extra-jars': 's3://extra-jars-bucket/prefix/file.jar', + '--extra-files': 's3://extra-files-bucket/prefix/file.txt', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + }); + + describe('Override SparkUI properties for ScalaSpark Flex ETL Job', () => { + + beforeEach(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.ScalaSparkFlexEtlJob(stack, 'ScalaSparkFlexEtlJob', { + role, + script, + jobName: 'ScalaSparkFlexEtlJob', + className, + sparkUI: { + bucket: sparkUIBucket, + prefix: '/prefix', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V3_0, + }); + }); + + test('Has Continuous Logging and SparkUIEnabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + '--enable-spark-ui': 'true', + '--spark-event-logs-path': Match.objectLike({ + 'Fn::Join': [ + '', + [ + 's3://', + { Ref: Match.anyValue() }, + '/prefix/', + ], + ], + }), + }), + }); + }); + }); +}); \ No newline at end of file diff --git a/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-streaming-jobs.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-streaming-jobs.test.ts new file mode 100644 index 0000000000000..da8dca9895c3f --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/scalaspark-streaming-jobs.test.ts @@ -0,0 +1,437 @@ +import * as cdk from 'aws-cdk-lib'; +import * as glue from '../lib'; +import * as iam from 'aws-cdk-lib/aws-iam'; +import * as s3 from 'aws-cdk-lib/aws-s3'; +import { Template, Match } from 'aws-cdk-lib/assertions'; +import { LogGroup } from 'aws-cdk-lib/aws-logs'; + +describe('Job', () => { + let stack: cdk.Stack; + let role: iam.IRole; + let script: glue.Code; + let codeBucket: s3.IBucket; + let job: glue.IJob; + let className: string; + let sparkUIBucket: s3.Bucket; + + beforeEach(() => { + stack = new cdk.Stack(); + role = iam.Role.fromRoleArn(stack, 'Role', 'arn:aws:iam::123456789012:role/TestRole'); + codeBucket = s3.Bucket.fromBucketName(stack, 'CodeBucket', 'bucketname'); + script = glue.Code.fromBucket(codeBucket, 'script'); + className = 'com.example.HelloWorld'; + }); + + describe('Create new Scala Spark Streaming Job with default parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkStreamingJob(stack, 'ImportedJob', { role, script, className }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '4.0', + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default WorkerType should be G.1X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: 'G.1X', + }); + }); + + test('Has Continuous Logging Enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + }), + }); + }); + + test('Default numberOfWorkers should be 10', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 10, + }); + }); + + test('Default job run queuing should be diabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: false, + }); + }); + }); + + describe('Create new ScalaSpark Streaming Job with log override parameters', () => { + + beforeEach(() => { + job = new glue.ScalaSparkStreamingJob(stack, 'ScalaSparkStreamingJob', { + jobName: 'ScalaSparkStreamingJob', + role, + script, + continuousLogging: { + enabled: true, + quiet: true, + logGroup: new LogGroup(stack, 'logGroup', { + logGroupName: '/aws-glue/jobs/${job.jobName}', + }), + logStreamPrefix: 'logStreamPrefix', + conversionPattern: 'convert', + }, + className, + }); + }); + + test('Has Continuous Logging enabled with optional args', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--continuous-log-logGroup': Match.objectLike({ + Ref: Match.anyValue(), + }), + '--enable-continuous-cloudwatch-log': 'true', + '--enable-continuous-log-filter': 'true', + '--continuous-log-logStreamPrefix': 'logStreamPrefix', + '--continuous-log-conversionPattern': 'convert', + }), + }); + }); + + }); + + describe('Create new ScalaSpark Streaming Job with logging explicitly disabled', () => { + + beforeEach(() => { + job = new glue.ScalaSparkStreamingJob(stack, 'ScalaSparkStreamingJob', { + jobName: 'ScalaSparkStreamingJob', + role, + script, + continuousLogging: { + enabled: false, + }, + className, + }); + }); + + test('Has Continuous Logging Disabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: { + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }, + }); + }); + + }); + + describe('Create ScalaSpark Streaming ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.ScalaSparkStreamingJob(stack, 'ScalaSparkStreamingJob', { + jobName: 'ScalaSparkStreamingJob', + description: 'This is a description', + role, + script, + className, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'ScalaSparkStreamingJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden max retries should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 2, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Create ScalaSpark Streaming ETL Job with optional properties', () => { + + beforeEach(() => { + job = new glue.ScalaSparkStreamingJob(stack, 'ScalaSparkStreamingJob', { + jobName: 'ScalaSparkStreamingJob', + description: 'This is a description', + role, + script, + className, + glueVersion: glue.GlueVersion.V3_0, + continuousLogging: { enabled: false }, + workerType: glue.WorkerType.G_2X, + maxConcurrentRuns: 100, + timeout: cdk.Duration.hours(2), + connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')], + securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'), + tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + numberOfWorkers: 2, + maxRetries: 2, + jobRunQueuingEnabled: true, + }); + }); + + test('Test job attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Custom Job Name and Description', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Name: 'ScalaSparkStreamingJob', + Description: 'This is a description', + }); + }); + + test('Overriden Glue Version should be 3.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: '3.0', + }); + }); + + test('Verify Default Arguemnts', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + }), + }); + }); + + test('Overriden numberOfWorkers should be 2', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + NumberOfWorkers: 2, + }); + }); + + test('Overriden WorkerType should be G.2X', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + WorkerType: glue.WorkerType.G_2X, + }); + }); + + test('Overriden job run queuing should be enabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + JobRunQueuingEnabled: true, + }); + }); + + test('Default max retries with job run queuing enabled should be 0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + MaxRetries: 0, + }); + }); + + test('Overriden max concurrent runs should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + ExecutionProperty: { + MaxConcurrentRuns: 100, + }, + }); + }); + + test('Overriden timeout should be 2 hours', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Timeout: 120, + }); + }); + + test('Overriden connections should be 100', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Connections: { + Connections: ['connectionName'], + }, + }); + }); + + test('Overriden security configuration should be set', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + SecurityConfiguration: 'securityConfigName', + }); + }); + + test('Should have tags', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + Tags: { + FirstTagName: 'FirstTagValue', + SecondTagName: 'SecondTagValue', + XTagName: 'XTagValue', + }, + }); + }); + }); + + describe('Override SparkUI properties for ScalaSpark Streaming ETL Job', () => { + + beforeEach(() => { + sparkUIBucket = new s3.Bucket(stack, 'sparkUIbucket', { bucketName: 'bucket-name' }); + job = new glue.ScalaSparkStreamingJob(stack, 'ScalaSparkStreamingJob', { + role, + script, + jobName: 'ScalaSparkStreamingJob', + className, + sparkUI: { + bucket: sparkUIBucket, + prefix: '/prefix', + }, + }); + }); + + test('Test default attributes', () => { + expect(job.jobArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'job', + resourceName: job.jobName, + })); + expect(job.grantPrincipal).toEqual(role); + }); + + test('Default Glue Version should be 4.0', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + GlueVersion: glue.GlueVersion.V4_0, + }); + }); + + test('Has Continuous Logging and SparkUIEnabled', () => { + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Job', { + DefaultArguments: Match.objectLike({ + '--enable-metrics': '', + '--enable-observability-metrics': 'true', + '--job-language': 'scala', + '--enable-continuous-cloudwatch-log': 'true', + '--enable-spark-ui': 'true', + '--spark-event-logs-path': Match.objectLike({ + 'Fn::Join': [ + '', + [ + 's3://', + { Ref: Match.anyValue() }, + '/prefix/', + ], + ], + }), + }), + }); + }); + }); +}); diff --git a/packages/@aws-cdk/aws-glue-alpha/test/workflow-triggers.test.ts b/packages/@aws-cdk/aws-glue-alpha/test/workflow-triggers.test.ts new file mode 100644 index 0000000000000..28102fc925b08 --- /dev/null +++ b/packages/@aws-cdk/aws-glue-alpha/test/workflow-triggers.test.ts @@ -0,0 +1,289 @@ +import * as cdk from 'aws-cdk-lib'; +import { Template, Capture } from 'aws-cdk-lib/assertions'; +import * as glue from '../lib'; +import { TriggerSchedule } from '../lib/triggers/trigger-options'; +import * as iam from 'aws-cdk-lib/aws-iam'; + +describe('Workflow and Triggers', () => { + let stack: cdk.Stack; + let workflow: glue.Workflow; + let job: glue.PySparkEtlJob; + let role: iam.Role; + + beforeEach(() => { + stack = new cdk.Stack(); + workflow = new glue.Workflow(stack, 'Workflow', { + description: 'MyWorkflow', + }); + + role = new iam.Role(stack, 'JobRole', { + assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), + }); + + job = new glue.PySparkEtlJob(stack, 'Job', { + script: glue.Code.fromAsset('test/job-script/hello_world.py'), + role, + glueVersion: glue.GlueVersion.V4_0, + workerType: glue.WorkerType.G_1X, + numberOfWorkers: 10, + }); + }); + + test('creates a workflow with triggers and actions', () => { + workflow.addOnDemandTrigger('OnDemandTrigger', { + actions: [{ job }], + }); + + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Workflow', { + Description: 'MyWorkflow', + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'ON_DEMAND', + WorkflowName: workflowReference, + Actions: [actionReference], + }); + + expect(workflowReference.asObject()).toEqual( + { + Ref: 'Workflow193EF7C1', + }, + ); + + expect(actionReference.asObject()).toEqual( + { + JobName: { + Ref: 'JobB9D00F9F', + }, + }, + ); + + }); + + test('creates a workflow with conditional trigger', () => { + workflow.addconditionalTrigger('ConditionalTrigger', { + actions: [{ job }], + predicate: { + conditions: [ + { + job, + state: glue.JobState.SUCCEEDED, + }, + ], + }, + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + const predicateReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'CONDITIONAL', + WorkflowName: workflowReference, + Actions: [actionReference], + Predicate: predicateReference, + }); + + expect(workflowReference.asObject()).toEqual( + expect.objectContaining({ + Ref: 'Workflow193EF7C1', + }), + ); + + expect(actionReference.asObject()).toEqual( + expect.objectContaining({ + JobName: { + Ref: 'JobB9D00F9F', + }, + }), + ); + + expect(predicateReference.asObject()).toEqual( + expect.objectContaining({ + Conditions: [ + { + JobName: { + Ref: 'JobB9D00F9F', + }, + LogicalOperator: 'EQUALS', + State: 'SUCCEEDED', + }, + ], + }), + ); + }); + + test('creates a workflow with daily scheduled trigger', () => { + workflow.addDailyScheduledTrigger('DailyScheduledTrigger', { + actions: [{ job }], + startOnCreation: true, + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'SCHEDULED', + WorkflowName: workflowReference, + Schedule: 'cron(0 0 * * ? *)', + StartOnCreation: true, + Actions: [actionReference], + }); + + expect(workflowReference.asObject()).toEqual( + expect.objectContaining({ + Ref: 'Workflow193EF7C1', + }), + ); + + expect(actionReference.asObject()).toEqual( + expect.objectContaining({ + JobName: { + Ref: 'JobB9D00F9F', + }, + }), + ); + }); + + test('creates a workflow with weekly scheduled trigger', () => { + workflow.addWeeklyScheduledTrigger('WeeklyScheduledTrigger', { + actions: [{ job }], + startOnCreation: false, + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'SCHEDULED', + WorkflowName: workflowReference, + Schedule: 'cron(0 0 ? * SUN *)', + StartOnCreation: false, + Actions: [actionReference], + }); + + expect(workflowReference.asObject()).toEqual( + expect.objectContaining({ + Ref: 'Workflow193EF7C1', + }), + ); + + expect(actionReference.asObject()).toEqual( + expect.objectContaining({ + JobName: { + Ref: 'JobB9D00F9F', + }, + }), + ); + }); + + test('creates a workflow with custom scheduled trigger', () => { + const customSchedule = TriggerSchedule.cron({ + minute: '0', + hour: '20', + weekDay: 'THU', + }); + + workflow.addCustomScheduledTrigger('CustomScheduledTrigger', { + actions: [{ job }], + schedule: customSchedule, + startOnCreation: true, + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'SCHEDULED', + WorkflowName: workflowReference, + Schedule: 'cron(0 20 ? * THU *)', + StartOnCreation: true, + Actions: [actionReference], + }); + + expect(workflowReference.asObject()).toEqual( + expect.objectContaining({ + Ref: 'Workflow193EF7C1', + }), + ); + + expect(actionReference.asObject()).toEqual( + expect.objectContaining({ + JobName: { + Ref: 'JobB9D00F9F', + }, + }), + ); + }); + + test('creates a workflow with notify event trigger', () => { + workflow.addNotifyEventTrigger('NotifyEventTrigger', { + actions: [{ job }], + eventBatchingCondition: { + batchSize: 10, + batchWindow: cdk.Duration.minutes(5), + }, + }); + + Template.fromStack(stack).resourceCountIs('AWS::Glue::Trigger', 1); + + const workflowReference = new Capture(); + const actionReference = new Capture(); + const eventBatchingConditionReference = new Capture(); + Template.fromStack(stack).hasResourceProperties('AWS::Glue::Trigger', { + Type: 'EVENT', + WorkflowName: workflowReference, + Actions: [actionReference], + EventBatchingCondition: eventBatchingConditionReference, + }); + + expect(workflowReference.asObject()).toEqual( + expect.objectContaining({ + Ref: 'Workflow193EF7C1', + }), + ); + + expect(actionReference.asObject()).toEqual( + expect.objectContaining({ + JobName: { + Ref: 'JobB9D00F9F', + }, + }), + ); + + expect(eventBatchingConditionReference.asObject()).toEqual( + expect.objectContaining({ + BatchSize: 10, + BatchWindow: 300, + }), + ); + }); +}); + +describe('.fromWorkflowAttributes()', () => { + let stack: cdk.Stack; + + beforeEach(() => { + stack = new cdk.Stack(); + }); + + test('with required attrs only', () => { + const workflowName = 'my-existing-workflow'; + const importedWorkflow = glue.Workflow.fromWorkflowAttributes(stack, 'ImportedWorkflow', { workflowName }); + + expect(importedWorkflow.workflowName).toEqual(workflowName); + expect(importedWorkflow.workflowArn).toEqual(stack.formatArn({ + service: 'glue', + resource: 'workflow', + resourceName: workflowName, + })); + }); +}); \ No newline at end of file diff --git a/packages/@aws-cdk/cli-lib-alpha/THIRD_PARTY_LICENSES b/packages/@aws-cdk/cli-lib-alpha/THIRD_PARTY_LICENSES index df020abc2aac7..8e08e6431cad8 100644 --- a/packages/@aws-cdk/cli-lib-alpha/THIRD_PARTY_LICENSES +++ b/packages/@aws-cdk/cli-lib-alpha/THIRD_PARTY_LICENSES @@ -693,6 +693,7 @@ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLI ** cdk-from-cfn@0.162.0 - https://www.npmjs.com/package/cdk-from-cfn/v/0.162.0 | MIT OR Apache-2.0 + ---------------- ** chalk@4.1.2 - https://www.npmjs.com/package/chalk/v/4.1.2 | MIT