Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds support for querying Redshift Serverless clusters #32785

Merged
merged 1 commit into from
Jul 24, 2023

Conversation

ivica-k
Copy link
Contributor

@ivica-k ivica-k commented Jul 23, 2023

Adds support for querying Redshift Serverless clusters using the RedshiftDataOperator. Serverless clusters require the workgroup_name to be specified and do not require the cluster_identifier.

Closes #32280

Tested with this DAG:

from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_data import RedshiftDataOperator

default_args = {
    "start_date": "2023-03-01",
}

dag = DAG(
    "test_redshift_serverless",
    default_args=default_args,
    schedule="0 0 1,15 * *",
    catchup=False
)

try:
    rd = RedshiftDataOperator(
        task_id="run_this",
        dag=dag,
        database="dev",
        workgroup_name="wg-ivica",
        sql="select current_user;",
        aws_conn_id="aws_default",
        wait_for_completion=True,
        return_sql_result=True
    )

    rd
    
except Exception as msg:
    print(msg)

and with using my personal AWS credentials, the result showed that the current user is IAM:ikolenkas, which is correct.

{
	"ColumnMetadata": [{
		"isCaseSensitive": true,
		"isCurrency": false,
		"isSigned": false,
		"label": "current_user",
		"length": 0,
		"name": "current_user",
		"nullable": 1,
		"precision": 63,
		"scale": 0,
		"schemaName": "",
		"tableName": "",
		"typeName": "bpchar"
	}],
	"Records": [
		[{
			"stringValue": "IAM:ikolenkas"
		}]
	],
	"TotalNumRows": 1,
	"ResponseMetadata": {
		"RequestId": "6ec07443-80c7-4681-823d-fa7689e85e7a",
		"HTTPStatusCode": 200,
		"HTTPHeaders": {
			"x-amzn-requestid": "6ec07443-80c7-4681-823d-fa7689e85e7a",
			"content-type": "application/x-amz-json-1.1",
			"content-length": "289",
			"date": "Sun, 23 Jul 2023 13:06:12 GMT"
		},
		"RetryAttempts": 0
	}
}

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Jul 23, 2023
@ivica-k ivica-k force-pushed the 32280-add-support-redshift-serverless branch 2 times, most recently from ed74f82 to 7cc9dae Compare July 23, 2023 16:04
@ivica-k ivica-k force-pushed the 32280-add-support-redshift-serverless branch from 7cc9dae to 56cc6d8 Compare July 24, 2023 14:46
) -> str:
"""
Execute a statement against Amazon Redshift.

:param database: the name of the database
:param sql: the SQL statement or list of SQL statement to run
:param cluster_identifier: unique identifier of a cluster
:param workgroup_name: name of the Redshift Serverless workgroup. Mutually exclusive with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: i think we should reorder the docstring as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely :) addressed in 71bea87

@@ -40,6 +40,9 @@ class RedshiftDataOperator(BaseOperator):
:param database: the name of the database
:param sql: the SQL statement or list of SQL statement to run
:param cluster_identifier: unique identifier of a cluster
:param workgroup_name: name of the Redshift Serverless workgroup. Mutually exclusive with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: addressed in 71bea87 :)

@@ -55,6 +58,7 @@ class RedshiftDataOperator(BaseOperator):

template_fields = (
"cluster_identifier",
"workgroup_name",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether this will need to be reordered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure, but for consistency addressed in 71bea87

@vincbeck
Copy link
Contributor

Besides the nits mentioned by @Lee-W , it looks good to me :) Good job!

@ivica-k ivica-k force-pushed the 32280-add-support-redshift-serverless branch from 56cc6d8 to 3a1189e Compare July 24, 2023 15:18
serverless clusters require the `workgroup_name` to be specified and do not
require the `cluster_identifier`
@ivica-k ivica-k force-pushed the 32280-add-support-redshift-serverless branch from 3a1189e to 71bea87 Compare July 24, 2023 15:20
@vincbeck vincbeck merged commit 8012c9f into apache:main Jul 24, 2023
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RedshiftDataOperator: Add support for Redshift serverless clusters
3 participants