Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate through blobs before checking prefixes #36202

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

Lee-W
Copy link
Member

@Lee-W Lee-W commented Dec 13, 2023

After reading https://cloud.google.com/storage/docs/json_api/v1/objects/list, https://github.com/googleapis/python-storage/blob/v2.14.0/google/cloud/storage/client.py#L1143-L1145C35 and https://github.com/apache/airflow/blob/providers-google/10.13.0rc2/airflow/providers/google/cloud/hooks/gcs.py#L732, I suspect we might not use the delimiter the right way. But it's going to be deprecated. So it might be better for us to keep the original behavior.

According to https://github.com/googleapis/python-storage/blob/v2.14.0/google/cloud/storage/client.py#L1213-L1217, the prefixes are not returned until the blobs are consumed. Thus, to keep the behavior of checking whether prefixes exists and decide the extended content, we'll need to consume blobs first

Related: #36130


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Dec 13, 2023
@Lee-W Lee-W force-pushed the fix-gcs-hook-blob-iterator-consuming branch from ef32ffd to f5899d8 Compare December 13, 2023 11:31
Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phanikumv
Copy link
Contributor

This PR will potentially fix the error on the google provider rc. Could you kindly include it in the next rc for google provider.

cc @potiuk

Copy link
Contributor

@phanikumv phanikumv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

@Lee-W Lee-W force-pushed the fix-gcs-hook-blob-iterator-consuming branch from 40b22ed to 1ea2c4c Compare December 13, 2023 13:03
@Lee-W Lee-W force-pushed the fix-gcs-hook-blob-iterator-consuming branch from 1ea2c4c to 0e77b7e Compare December 14, 2023 01:55
@Lee-W Lee-W force-pushed the fix-gcs-hook-blob-iterator-consuming branch from 0e77b7e to 8fe8f77 Compare December 14, 2023 02:16
@potiuk potiuk merged commit e83a986 into apache:main Dec 14, 2023
50 checks passed
@potiuk
Copy link
Member

potiuk commented Dec 14, 2023

Just in time for RC3

@pankajastro pankajastro deleted the fix-gcs-hook-blob-iterator-consuming branch December 14, 2023 19:13
@shahar1
Copy link
Contributor

shahar1 commented Dec 15, 2023

Looks great! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants