Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(redshift): column compression encodings and comments can now be customised #23597

Merged
merged 26 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0f50d3f
addition: initial testing suite
Rizxcviii Jan 6, 2023
20fe7fa
addition: initial column encoding methods
Rizxcviii Jan 6, 2023
7517f8d
addition: docstring for ColumnEncoding
Rizxcviii Jan 6, 2023
8669223
addition: assigning enums to string variables
Rizxcviii Jan 6, 2023
38cfb4a
addition: adding encoding on creation of table
Rizxcviii Jan 6, 2023
656a307
addition: updates on column encoding
Rizxcviii Jan 6, 2023
7f2c6b9
addition: table comment and column comment
Rizxcviii Jan 6, 2023
5a72700
modification, addition
Rizxcviii Jan 6, 2023
77fe42c
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Jan 6, 2023
68d8107
modification: integ test
Rizxcviii Jan 6, 2023
8689510
addition: docuementation for encoding and commentting
Rizxcviii Jan 6, 2023
73366fd
addition, modification:
Rizxcviii Jan 23, 2023
6f73d7a
modification: removing table comments code
Rizxcviii Jan 26, 2023
6ec1f26
modification: removing table comments code
Rizxcviii Jan 26, 2023
033bdc7
modification: integ test snapshot
Rizxcviii Jan 26, 2023
6a8ede8
modification: reverting import cleanup
Rizxcviii Jan 26, 2023
c8ac796
modification: removing table comments from README
Rizxcviii Jan 26, 2023
6ce5fa9
modification: bugfix, nested and incorrect test
Rizxcviii Jan 26, 2023
a96840c
modification: using private enum
Rizxcviii Jan 26, 2023
c4582e9
addition: line break on EOF
Rizxcviii Jan 26, 2023
2c13a23
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 6, 2023
07b6126
modification: using an actual compression encoding used by VARCHAR
Rizxcviii Feb 9, 2023
452d8ea
modification: rosetta fixing, was probably not run using the yarn com…
Rizxcviii Feb 9, 2023
461a93d
removal: lock file
Rizxcviii Feb 9, 2023
c78e737
modification: typo
Rizxcviii Feb 9, 2023
a626917
Merge branch 'main' into feature/commentting-encoding
mergify[bot] Feb 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 43 additions & 5 deletions packages/@aws-cdk/aws-redshift/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,15 @@ import * as ec2 from '@aws-cdk/aws-ec2';
import * as s3 from '@aws-cdk/aws-s3';

const vpc = new ec2.Vpc(this, 'Vpc');
const bucket = s3.Bucket.fromBucketName(stack, 'bucket', 'logging-bucket');
const bucket = s3.Bucket.fromBucketName(this, 'bucket', 'logging-bucket');

const cluster = new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
},
vpc,
loggingProperties: {
loggingBucket = bucket,
loggingBucket: bucket,
loggingKeyPrefix: 'prefix',
}
});
Expand Down Expand Up @@ -200,6 +200,35 @@ new Table(this, 'Table', {
});
```

Both the table and their respective columns can be configured to contain comments:

```ts fixture=cluster
new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)', comment: 'This is a comment' },
{ name: 'col2', dataType: 'float', comment: 'This is a another comment' }
],
cluster: cluster,
databaseName: 'databaseName',
comment: 'This is a comment',
});
```

Table columns can be configured to use a specific compression encoding:

```ts fixture=cluster
import { ColumnEncoding } from '@aws-cdk/aws-redshift';

new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)', encoding: ColumnEncoding.DELTA },
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved
{ name: 'col2', dataType: 'float', encoding: ColumnEncoding.DELTA32K },
],
cluster: cluster,
databaseName: 'databaseName',
});
```

### Granting Privileges

You can give a user privileges to perform certain actions on a table by using the
Expand Down Expand Up @@ -305,7 +334,9 @@ cluster.addRotationMultiUser('MultiUserRotation', {
You can add a parameter to a parameter group with`ClusterParameterGroup.addParameter()`.

```ts
const params = new ClusterParameterGroup(stack, 'Params', {
import { ClusterParameterGroup } from '@aws-cdk/aws-redshift';

const params = new ClusterParameterGroup(this, 'Params', {
description: 'desc',
parameters: {
require_ssl: 'true',
Expand All @@ -318,6 +349,8 @@ params.addParameter('enable_user_activity_logging', 'true');
Additionally, you can add a parameter to the cluster's associated parameter group with `Cluster.addToParameterGroup()`. If the cluster does not have an associated parameter group, a new parameter group is created.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';
declare const vpc: ec2.Vpc;

const cluster = new Cluster(this, 'Cluster', {
Expand All @@ -336,9 +369,11 @@ cluster.addToParameterGroup('enable_user_activity_logging', 'true');
If you configure your cluster to be publicly accessible, you can optionally select an *elastic IP address* to use for the external IP address. An elastic IP address is a static IP address that is associated with your AWS account. You can use an elastic IP address to connect to your cluster from outside the VPC. An elastic IP address gives you the ability to change your underlying configuration without affecting the IP address that clients use to connect to your cluster. This approach can be helpful for situations such as recovery after a failure.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';
declare const vpc: ec2.Vpc;

new Cluster(stack, 'Redshift', {
new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
masterPassword: cdk.SecretValue.unsafePlainText('tooshort'),
Expand All @@ -352,6 +387,7 @@ new Cluster(stack, 'Redshift', {
If the Cluster is in a VPC and you want to connect to it using the private IP address from within the cluster, it is important to enable *DNS resolution* and *DNS hostnames* in the VPC config. If these parameters would not be set, connections from within the VPC would connect to the elastic IP address and not the private IP address.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
const vpc = new ec2.Vpc(this, 'VPC', {
enableDnsSupport: true,
enableDnsHostnames: true,
Expand All @@ -373,9 +409,11 @@ In some cases, you might want to associate the cluster with an elastic IP addres
When you use Amazon Redshift enhanced VPC routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your virtual private cloud (VPC) based on the Amazon VPC service. By using enhanced VPC routing, you can use standard VPC features, such as VPC security groups, network access control lists (ACLs), VPC endpoints, VPC endpoint policies, internet gateways, and Domain Name System (DNS) servers, as described in the Amazon VPC User Guide. You use these features to tightly manage the flow of data between your Amazon Redshift cluster and other resources. When you use enhanced VPC routing to route traffic through your VPC, you can also use VPC flow logs to monitor COPY and UNLOAD traffic.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as cdk from '@aws-cdk/core';
declare const vpc: ec2.Vpc;

new Cluster(stack, 'Redshift', {
new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
masterPassword: cdk.SecretValue.unsafePlainText('tooshort'),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import * as AWSLambda from 'aws-lambda';
import { Column } from '../../table';
import { executeStatement } from './redshift-data';
import { ClusterProps, TableAndClusterProps, TableSortStyle } from './types';
import { ClusterProps, TableAndClusterProps, TableSortStyle, ColumnEncoding } from './types';
import { areColumnsEqual, getDistKeyColumn, getSortKeyColumns } from './util';

export async function handler(props: TableAndClusterProps, event: AWSLambda.CloudFormationCustomResourceEvent) {
Expand Down Expand Up @@ -40,7 +40,7 @@ async function createTable(
tableAndClusterProps: TableAndClusterProps,
): Promise<string> {
const tableName = tableNamePrefix + tableNameSuffix;
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}`).join();
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}${getEncodingColumnString(column)}`).join();
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved

let statement = `CREATE TABLE ${tableName} (${tableColumnsString})`;

Expand All @@ -60,6 +60,16 @@ async function createTable(
}

await executeStatement(statement, tableAndClusterProps);

if (tableAndClusterProps.comment) {
await executeStatement(`COMMENT ON TABLE ${tableName} IS '${tableAndClusterProps.comment}'`, tableAndClusterProps);
}

for (const column of tableColumns) {
if (column.comment) {
await executeStatement(`COMMENT ON COLUMN ${tableName}.${column.name} IS '${column.comment}'`, tableAndClusterProps);
}
}
return tableName;
}

Expand Down Expand Up @@ -87,6 +97,11 @@ async function updateTable(
return createTable(tableNamePrefix, tableNameSuffix, tableColumns, tableAndClusterProps);
}

const oldComment = oldResourceProperties.comment;
if (tableAndClusterProps.comment !== oldComment) {
alterationStatements.push(`COMMENT ON TABLE ${tableName} IS ${tableAndClusterProps.comment ? `'${tableAndClusterProps.comment}'` : 'NULL'}`);
}

const oldTableColumns = oldResourceProperties.tableColumns;
const columnDeletions = oldTableColumns.filter(oldColumn => (
tableColumns.every(column => oldColumn.name !== column.name)
Expand Down Expand Up @@ -143,6 +158,36 @@ async function updateTable(
}
}

const oldEncodingColumns = oldTableColumns.filter(column => column.encoding);
const newEncodingColumns = tableColumns.filter(column => column.encoding);
if (!areColumnsEqual(oldEncodingColumns, newEncodingColumns)) {
// Check for any new columns that need to be encoded.
const encodingColumnAdditions = newEncodingColumns.filter(column => {
return !oldEncodingColumns.some(oldColumn => column.name === oldColumn.name && column.encoding === oldColumn.encoding);
}).map(column => `ALTER TABLE ${tableName} ALTER COLUMN ${column.name} ENCODE ${column.encoding}`);
alterationStatements.push(...encodingColumnAdditions);
// Check for any old columns that need to be reverted.
const encodingColumnDeletions = oldEncodingColumns.filter(column => {
return !newEncodingColumns.some(newColumn => column.name === newColumn.name && column.encoding === newColumn.encoding);
}).map(column => `ALTER TABLE ${tableName} ALTER COLUMN ${column.name} ENCODE ${ColumnEncoding.AUTO}`);
alterationStatements.push(...encodingColumnDeletions);
}

const oldCommentedColumns = oldTableColumns.filter(column => column.comment);
const newCommentedColumns = tableColumns.filter(column => column.comment);
if (!areColumnsEqual(oldCommentedColumns, newCommentedColumns)) {
// Check for any new columns that need to be commented.
const commentColumnAdditions = newCommentedColumns.filter(column => {
return !oldCommentedColumns.some(oldColumn => column.name === oldColumn.name && column.comment === oldColumn.comment);
}).map(column => `COMMENT ON COLUMN ${tableName}.${column.name} IS '${column.comment}'`);
alterationStatements.push(...commentColumnAdditions);
// Check for any old columns that need to be reverted.
const commentColumnDeletions = oldCommentedColumns.filter(column => {
return !newCommentedColumns.some(newColumn => column.name === newColumn.name && column.comment === newColumn.comment);
}).map(column => `COMMENT ON COLUMN ${tableName}.${column.name} IS NULL`);
alterationStatements.push(...commentColumnDeletions);
}

await Promise.all(alterationStatements.map(statement => executeStatement(statement, tableAndClusterProps)));

return tableName;
Expand All @@ -151,3 +196,10 @@ async function updateTable(
function getSortKeyColumnsString(sortKeyColumns: Column[]) {
return sortKeyColumns.map(column => column.name).join();
}

function getEncodingColumnString(column: Column): string {
if (column.encoding) {
return ` ENCODE ${column.encoding}`;
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved
}
return '';
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,113 @@ export enum TableSortStyle {
*/
INTERLEAVED = 'INTERLEAVED',
}

/**
* The compression encoding of a column.
* This has been duplicated here to exporting private types.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
*/
export enum ColumnEncoding {
/**
* Amazon Redshift assigns an optimal encoding based on the column data.
* This is the default.
*/
AUTO = 'AUTO',

/**
* The column is not compressed.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Raw_encoding.html
*/
RAW = 'RAW',

/**
* The column is compressed using the AZ64 algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/az64-encoding.html
*/
AZ64 = 'AZ64',

/**
* The column is compressed using a separate dictionary for each block column value on disk.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html
*/
BYTEDICT = 'BYTEDICT',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 1-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA = 'DELTA',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 2-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA32K = 'DELTA32K',

/**
* The column is compressed using the LZO algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/lzo-encoding.html
*/
LZO = 'LZO',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 1 byte.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY8 = 'MOSTLY8',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 2 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY16 = 'MOSTLY16',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 4 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY32 = 'MOSTLY32',

/**
* The column is compressed by recording the number of occurrences of each value in the column.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html
*/
RUNLENGTH = 'RUNLENGTH',

/**
* The column is compressed by recording the first 245 unique words and then using a 1-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT255 = 'TEXT255',

/**
* The column is compressed by recording the first 32K unique words and then using a 2-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT32K = 'TEXT32K',

/**
* The column is compressed using the ZSTD algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/zstd-encoding.html
*/
ZSTD = 'ZSTD',
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export interface TableHandlerProps {
readonly tableColumns: Column[];
readonly distStyle?: TableDistStyle;
readonly sortStyle: TableSortStyle;
readonly comment?: string;
}

export interface TablePrivilege {
Expand Down
Loading