Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow storage options to be passed #35820

Merged
merged 11 commits into from
Dec 8, 2023

Conversation

bolkedebruin
Copy link
Contributor

This allows storage options to be passed on to the underlying implementation. It does require a release of the providers as the signature of get_fs has changed.

@kaxil @uranusjr @Taragolis


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues labels Nov 23, 2023
airflow/providers/amazon/aws/fs/s3.py Outdated Show resolved Hide resolved
airflow/providers/google/cloud/fs/gcs.py Outdated Show resolved Hide resolved
airflow/providers/microsoft/azure/fs/adls.py Outdated Show resolved Hide resolved
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bolkedebruin
Copy link
Contributor Author

Note: something went wrong with rebasing, fixing it.

@@ -32,21 +32,25 @@
if TYPE_CHECKING:
from fsspec import AbstractFileSystem

from airflow.io.typedef import Properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a dedicated type alias? Do we plan to make it stricter with say TypedDict? It feels to me it’s better to use dict[str, str] instead; slightly repetitive but cognatively easier for readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the 'signal' of Properties that it automatically says I can only pass [str, str] instead. To me it feels cognitive easier to see Properties than dict[str, str] which has no semantic meaning and less error prone. But I am not really tied up on it.

And yes maybe we could make it into a TypedDict

what do you say?

airflow/io/__init__.py Outdated Show resolved Hide resolved
Comment on lines -137 to +149
_STORE_CACHE[alias] = store = ObjectStore(protocol=protocol, conn_id=conn_id, fs=fs)
_STORE_CACHE[alias] = store = ObjectStore(protocol=protocol, conn_id=conn_id, fs=fs, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me uncomfortable that storage options are only effective in the first call to attach; in later calls they will be silently ignored, even when options in the existing storage object does not match. It would be more explicit (thus less prone to user errors) if storage creation and retrieval should be splitted into two functions instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you have a point. Do you have a suggestion?

bolkedebruin and others added 10 commits December 8, 2023 11:16
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
This enables an alternative ObjectStoragePath init syntax, using the
auth section in the URI to supply conn ID instead of a separate keyword
argument. The explicit keyword argument is honored if supplied.
@bolkedebruin bolkedebruin merged commit aba58ad into apache:main Dec 8, 2023
50 checks passed
@bolkedebruin
Copy link
Contributor Author

Merging this - if we want to adjust the typing and split the attach function we can do that in the follow ups.

@bolkedebruin bolkedebruin deleted the pass_storage_options branch December 8, 2023 12:35
@eladkal eladkal added this to the Airflow 2.8.1 milestone Dec 9, 2023
ephraimbuddy pushed a commit that referenced this pull request Dec 11, 2023
This allows storage options to be passed to the fsspec backend as part
of the kwargs to ObjectStorage.

---------

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit aba58ad)
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) AIP-58 labels Dec 11, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Dec 12, 2023
The released providers added support to previous version of the
`airflow.io` - where options were not passed to `get_fs` method
that provides Fsspec compatible FileSystem. However apache#35820 added
positional "options" parameter when the method is called and it
broke already released providers.

This PR dynamically inspects signature of the get_fs method
and when one parameter is detected, it will skip passing options
to get_fs method call.
ephraimbuddy pushed a commit that referenced this pull request Dec 12, 2023
…6186)

The released providers added support to previous version of the
`airflow.io` - where options were not passed to `get_fs` method
that provides Fsspec compatible FileSystem. However #35820 added
positional "options" parameter when the method is called and it
broke already released providers.

This PR dynamically inspects signature of the get_fs method
and when one parameter is detected, it will skip passing options
to get_fs method call.
ephraimbuddy pushed a commit that referenced this pull request Dec 12, 2023
…6186)

The released providers added support to previous version of the
`airflow.io` - where options were not passed to `get_fs` method
that provides Fsspec compatible FileSystem. However #35820 added
positional "options" parameter when the method is called and it
broke already released providers.

This PR dynamically inspects signature of the get_fs method
and when one parameter is detected, it will skip passing options
to get_fs method call.

(cherry picked from commit 7799c51)
tatiana pushed a commit to astronomer/astronomer-cosmos that referenced this pull request Nov 20, 2024
During the work on PR
#1297, an issue
arose where the Azure remote manifest task began failing after
installing providers and packages with constraints. To allow other
tests, which were running successfully, to proceed, the task was
temporarily disabled. Upon reviewing the GitHub Actions logs from
previous successful runs, it was
[observed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11573670971/job/32216315282#step:6:250)
that the Azure provider version installed was 10.5.1. However, after the
refactoring introduced in the PR, the failing actions
[showed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:6:474)
azure provider==8.4.0 being installed with the constraints file.

To investigate, I tested locally to identify a working version. While
the task failed with 8.4.0, it succeeded with 8.5.0. Analyzing the
failure
[logs](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:7:467)
and reviewing the Azure provider changelog hints that
apache/airflow#35820 is potentially the fix for
the failure with our connection setup in the CI that was released in
8.5.0.

Therefore, I propose using azure provider>=8.5.0. However, the Airflow
[constraints
file](https://raw.githubusercontent.com/apache/airflow/constraints-2.8.0/constraints-3.8.txt)
for version 2.8 specifies azure provider==8.4.0, which conflicts with
this requirement. To address this, I am making changes to the
pre-install script in our CI to install azure provider>=8.5.0 without
relying on the constraints file, citing the reasons above.

closes: #1304
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-58 area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:amazon-aws AWS/Amazon - related issues provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants