Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Redshift unload query when select query contains single quotes #30300

Closed

Conversation

hussein-awala
Copy link
Member

closes: #30287


This PR fixes the Redshift unload query when the select query contains quotes, by replacing ' by $$. There is an alternative solution described in Redshift unload doc, which is duplicating the single quotes:

If your query contains quotation marks (for example to enclose literal values), put the literal between two sets of single quotation marks—you must also enclose the query between single quotation marks:
('select * from venue where venuestate=''NV''')

But I'm using $$ in my queries and it works fine

@hussein-awala
Copy link
Member Author

The test failure is not related to this change:

airflow-worker_1     | [2023-03-26 11:15:34,422: ERROR/ForkPoolWorker-2] Failed to bag_dag: /home/airflow/.local/lib/python3.7/site-packages/airflow/example_dags/example_params_ui_tutorial.py
airflow-worker_1     | Traceback (most recent call last):
airflow-worker_1     |   File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 437, in _process_modules
airflow-worker_1     |     dag.validate()
airflow-worker_1     |   File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 668, in validate
airflow-worker_1     |     self.params.validate()
airflow-worker_1     |   File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/param.py", line 243, in validate
airflow-worker_1     |     raise ParamValidationError(f"Invalid input for param {k}: {ve}") from None
airflow-worker_1     | airflow.exceptions.ParamValidationError: Invalid input for param date_time: '2023-03-26 12:17:00' is not a 'date-time'
airflow-worker_1     | 
airflow-worker_1     | Failed validating 'format' in schema:
airflow-worker_1     |     {'format': 'date-time', 'title': 'Date-Time Picker', 'type': 'string'}
airflow-worker_1     | 
airflow-worker_1     | On instance:
airflow-worker_1     |     '2023-03-26 12:17:00'

It was fixed by @potiuk in #29395, but it seems like prod image doesn't consider this fix.

@potiuk
Copy link
Member

potiuk commented Mar 27, 2023

Looks like a flaky test actually.

@vincbeck
Copy link
Contributor

How this change would behave with queries already using double quotes? I know there had been many questions about this issue on Slack and the usual recommendation is to use double quotes. My guess is it wont work (I might be wrong). One solution would to detect if the string uses double quotes and if it does, no need to use $$. It would also be great to see this use case in the unit tests :)

Besides that, thanks for trying to find a solution to this problem because, as mentioned, it had been brought up a lot on Slack

@hussein-awala
Copy link
Member Author

@vincbeck I'll test it with double quotes and if that doesn't work I'll implement your interesting suggestion.

@hussein-awala
Copy link
Member Author

@vincbeck I tested it and I had a problem, so I added a check as you proposed with a warning, we can remove this check in the next major release, can you check it now?

@@ -143,8 +144,17 @@ def __init__(
def _build_unload_query(
self, credentials_block: str, select_query: str, s3_key: str, unload_options: str
) -> str:
# TODO: remove in provider 8.0.0
if "''" in select_query:
logging.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to remove this check at the next major release? I dont understand. Users, at any point in time, can use this operator and specify a query with already double quoted strings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Users may stumble on this in the future after you have removed the check.

Also is it possible the string contains some double quotes but not everywhere? Do we want to try handle these half working situations? This type of string sanitization code can often get very hairy very quick. Is there a thirdparty library we can use instead which specializes in this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know any library doing that but that'd be cool. Otherwise, I am in favor of NOT managing these kind of use cases. These use cases (mix of quotes escaped and some unescaped) are not valid and users should fix it. The current implementation would not alter the string which, I think, is the correct behavior

@vincbeck
Copy link
Contributor

@vincbeck I tested it and I had a problem, so I added a check as you proposed with a warning, we can remove this check in the next major release, can you check it now?

I just added a comment about it. Something is not clear to me

@github-actions
Copy link

github-actions bot commented Jun 2, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 2, 2023
@github-actions github-actions bot closed this Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RedshiftToS3 Operator Wrapping Query in Quotes Instead of $$
4 participants