Skip to content

Commit

Permalink
feat(glue): add ExternalTable for use with connections (#24753)
Browse files Browse the repository at this point in the history
Changing the table structure to include an initial `TableBase` abstract class, allowing different tables of different data sources to be created from. Initially there are two, `S3Table` and `ExternalTable`.

- `S3Table`: The current table structure that has been used throughout the previous versions of the CDK
- `ExternalTable`: The new glue table that will be used to store metadata about external data sources. This subclass will contain an `externalDataLocation` property to explicitly specify the `Location` property of the underlying `CfnTable` L1 construct
- `Table`: This is now `@deprecated` to shift the usage towards `S3Table`

Closes #24741.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
Rizxcviii committed Sep 11, 2023
1 parent 23fba8a commit 1c03cb3
Show file tree
Hide file tree
Showing 28 changed files with 4,396 additions and 330 deletions.
50 changes: 36 additions & 14 deletions packages/@aws-cdk/aws-glue-alpha/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ A Glue table describes a table of data in S3: its structure (column names and ty

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
Expand All @@ -230,7 +230,7 @@ By default, a S3 bucket will be created to store the table's data but you can ma
```ts
declare const myBucket: s3.Bucket;
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
bucket: myBucket,
s3Prefix: 'my-table/',
// ...
Expand All @@ -247,7 +247,7 @@ Glue tables can be configured to contain user-defined properties, to describe th

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
storageParameters: [
glue.StorageParameter.skipHeaderLineCount(1),
glue.StorageParameter.compressionType(glue.CompressionType.GZIP),
Expand All @@ -269,7 +269,7 @@ To improve query performance, a table can specify `partitionKeys` on which data

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
Expand Down Expand Up @@ -300,7 +300,7 @@ property:

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
Expand Down Expand Up @@ -337,7 +337,7 @@ If you have a table with a large number of partitions that grows over time, cons

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
Expand All @@ -355,6 +355,28 @@ new glue.Table(this, 'MyTable', {
});
```

### Glue Connections

Glue connections allow external data connections to third party databases and data warehouses. However, these connections can also be assigned to Glue Tables, allowing you to query external data sources using the Glue Data Catalog.

Whereas `S3Table` will point to (and if needed, create) a bucket to store the tables' data, `ExternalTable` will point to an existing table in a data source. For example, to create a table in Glue that points to a table in Redshift:

```ts
declare const myConnection: glue.Connection;
declare const myDatabase: glue.Database;
new glue.ExternalTable(this, 'MyTable', {
connection: myConnection,
externalDataLocation: 'default_db_public_example', // A table in Redshift
// ...
database: myDatabase,
columns: [{
name: 'col1',
type: glue.Schema.STRING,
}],
dataFormat: glue.DataFormat.JSON,
});
```

## [Encryption](https://docs.aws.amazon.com/athena/latest/ug/encryption.html)

You can enable encryption on a Table's data:
Expand All @@ -363,7 +385,7 @@ You can enable encryption on a Table's data:

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.S3_MANAGED,
// ...
database: myDatabase,
Expand All @@ -380,7 +402,7 @@ new glue.Table(this, 'MyTable', {
```ts
declare const myDatabase: glue.Database;
// KMS key is created automatically
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.KMS,
// ...
database: myDatabase,
Expand All @@ -392,7 +414,7 @@ new glue.Table(this, 'MyTable', {
});

// with an explicit KMS key
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.KMS,
encryptionKey: new kms.Key(this, 'MyKey'),
// ...
Expand All @@ -409,7 +431,7 @@ new glue.Table(this, 'MyTable', {

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.KMS_MANAGED,
// ...
database: myDatabase,
Expand All @@ -426,7 +448,7 @@ new glue.Table(this, 'MyTable', {
```ts
declare const myDatabase: glue.Database;
// KMS key is created automatically
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.CLIENT_SIDE_KMS,
// ...
database: myDatabase,
Expand All @@ -438,7 +460,7 @@ new glue.Table(this, 'MyTable', {
});

// with an explicit KMS key
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
encryption: glue.TableEncryption.CLIENT_SIDE_KMS,
encryptionKey: new kms.Key(this, 'MyKey'),
// ...
Expand All @@ -451,15 +473,15 @@ new glue.Table(this, 'MyTable', {
});
```

*Note: you cannot provide a `Bucket` when creating the `Table` if you wish to use server-side encryption (`KMS`, `KMS_MANAGED` or `S3_MANAGED`)*.
*Note: you cannot provide a `Bucket` when creating the `S3Table` if you wish to use server-side encryption (`KMS`, `KMS_MANAGED` or `S3_MANAGED`)*.

## Types

A table's schema is a collection of columns, each of which have a `name` and a `type`. Types are recursive structures, consisting of primitive and complex types:

```ts
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
new glue.S3Table(this, 'MyTable', {
columns: [{
name: 'primitive_column',
type: glue.Schema.STRING,
Expand Down
171 changes: 171 additions & 0 deletions packages/@aws-cdk/aws-glue-alpha/lib/external-table.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
import { CfnTable } from 'aws-cdk-lib/aws-glue';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';
import { IConnection } from './connection';
import { Column } from './schema';
import { PartitionIndex, TableBase, TableBaseProps } from './table-base';

export interface ExternalTableProps extends TableBaseProps {
/**
* The connection the table will use when performing reads and writes.
*
* @default - No connection
*/
readonly connection: IConnection;

/**
* The data source location of the glue table, (e.g. `default_db_public_example` for Redshift).
*
* If this property is set, it will override both `bucket` and `s3Prefix`.
*
* @default - No outsourced data source location
*/
readonly externalDataLocation: string;
}

/**
* A Glue table that targets an external data location (e.g. A table in a Redshift Cluster).
*/
export class ExternalTable extends TableBase {
/**
* Name of this table.
*/
public readonly tableName: string;

/**
* ARN of this table.
*/
public readonly tableArn: string;

/**
* The connection associated to this table
*/
public readonly connection: IConnection;

/**
* This table's partition indexes.
*/
public readonly partitionIndexes?: PartitionIndex[];

protected readonly tableResource: CfnTable;

constructor(scope: Construct, id: string, props: ExternalTableProps) {
super(scope, id, props);
this.connection = props.connection;
this.tableResource = new CfnTable(this, 'Table', {
catalogId: props.database.catalogId,

databaseName: props.database.databaseName,

tableInput: {
name: this.physicalName,
description: props.description || `${this.physicalName} generated by CDK`,

partitionKeys: renderColumns(props.partitionKeys),

parameters: {
'classification': props.dataFormat.classificationString?.value,
'has_encrypted_data': true,
'partition_filtering.enabled': props.enablePartitionFiltering,
'connectionName': props.connection.connectionName,
},
storageDescriptor: {
location: props.externalDataLocation,
compressed: this.compressed,
storedAsSubDirectories: props.storedAsSubDirectories ?? false,
columns: renderColumns(props.columns),
inputFormat: props.dataFormat.inputFormat.className,
outputFormat: props.dataFormat.outputFormat.className,
serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},
parameters: props.storageParameters ? props.storageParameters.reduce((acc, param) => {
if (param.key in acc) {
throw new Error(`Duplicate storage parameter key: ${param.key}`);
}
const key = param.key;
acc[key] = param.value;
return acc;
}, {} as { [key: string]: string }) : undefined,
},

tableType: 'EXTERNAL_TABLE',
},
});

this.tableName = this.getResourceNameAttribute(this.tableResource.ref);
this.tableArn = this.stack.formatArn({
service: 'glue',
resource: 'table',
resourceName: `${this.database.databaseName}/${this.tableName}`,
});
this.node.defaultChild = this.tableResource;

// Partition index creation relies on created table.
if (props.partitionIndexes) {
this.partitionIndexes = props.partitionIndexes;
this.partitionIndexes.forEach((index) => this.addPartitionIndex(index));
}
}

/**
* Grant read permissions to the table
*
* @param grantee the principal
*/
public grantRead(grantee: iam.IGrantable): iam.Grant {
const ret = this.grant(grantee, readPermissions);
return ret;
}

/**
* Grant write permissions to the table
*
* @param grantee the principal
*/
public grantWrite(grantee: iam.IGrantable): iam.Grant {
const ret = this.grant(grantee, writePermissions);
return ret;
}

/**
* Grant read and write permissions to the table
*
* @param grantee the principal
*/
public grantReadWrite(grantee: iam.IGrantable): iam.Grant {
const ret = this.grant(grantee, [...readPermissions, ...writePermissions]);
return ret;
}
}

const readPermissions = [
'glue:BatchGetPartition',
'glue:GetPartition',
'glue:GetPartitions',
'glue:GetTable',
'glue:GetTables',
'glue:GetTableVersion',
'glue:GetTableVersions',
];

const writePermissions = [
'glue:BatchCreatePartition',
'glue:BatchDeletePartition',
'glue:CreatePartition',
'glue:DeletePartition',
'glue:UpdatePartition',
];

function renderColumns(columns?: Array<Column | Column>) {
if (columns === undefined) {
return undefined;
}
return columns.map(column => {
return {
name: column.name,
type: column.type.inputString,
comment: column.comment,
};
});
}
5 changes: 4 additions & 1 deletion packages/@aws-cdk/aws-glue-alpha/lib/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@ export * from './connection';
export * from './data-format';
export * from './data-quality-ruleset';
export * from './database';
export * from './external-table';
export * from './job';
export * from './job-executable';
export * from './s3-table';
export * from './schema';
export * from './security-configuration';
export * from './storage-parameter';
export * from './table';
export * from './table-base';
export * from './table-deprecated';
Loading

0 comments on commit 1c03cb3

Please sign in to comment.