Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate delimiter param and source object's wildcards in GCS, introduce match_glob param. #31261

Merged
merged 4 commits into from
Jun 30, 2023

Conversation

shahar1
Copy link
Contributor

@shahar1 shahar1 commented May 12, 2023

closes: #29115

Important notes

  1. Deprecating the delimiter parameter and wildcards in source objects is essential. These features are not native to the GCS API; they were implemented as a workaround that now heavily misuses the delimiter parameter. This implementation likely stems from the fact that the new parameter, match_glob, did not exist when these features were initially implemented. By utilizing this parameter instead of delimiter - the original issue resolves.
  2. Unfortunately, the match_glob param is not supported by the official GCS Python API today. To deal with it, I copied and patched list_blob() directly from their source code to the hook. I'm unsure if it's OK to do regarding licensing and maintainability from our side. If it's fine, let me know if I need to add any additional licensing comments - otherwise, we'll have to wait until the official release (according to GCP's comment on my issue - it should be around Q3).
  3. Within the modified transfer operations (GCSToGCSOperator, GCSToSFTPOperator, GCSToGoogleDriveOperator), there is an internal logic that deals with wildcards in the store object(s) and calls the list() method with the delimiter param. To avoid any chances of breaking existing behavior, I intentionally avoided using the match_glob param there. I added commentary that instructs what should be changed when finalizing the deprecation.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@shahar1 shahar1 requested review from eladkal and o-nikolas as code owners May 12, 2023 18:48
@boring-cyborg boring-cyborg bot added area:providers area:system-tests kind:documentation provider:amazon-aws AWS/Amazon - related issues provider:google Google (including GCP) related issues labels May 12, 2023
@shahar1 shahar1 force-pushed the gcs-match-glob branch 4 times, most recently from 47de2b3 to 4f5b4f2 Compare May 12, 2023 20:23
@shahar1 shahar1 force-pushed the gcs-match-glob branch 2 times, most recently from c623f5d to 7c9669a Compare May 22, 2023 18:12
@eladkal eladkal requested a review from potiuk June 27, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:system-tests kind:documentation provider:amazon-aws AWS/Amazon - related issues provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCSToGCSOperator delimiter bug
2 participants