-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow storage options to be passed #35820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
db56a46
to
1a56be8
Compare
Note: something went wrong with rebasing, fixing it. |
@@ -32,21 +32,25 @@ | |||
if TYPE_CHECKING: | |||
from fsspec import AbstractFileSystem | |||
|
|||
from airflow.io.typedef import Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a dedicated type alias? Do we plan to make it stricter with say TypedDict? It feels to me it’s better to use dict[str, str]
instead; slightly repetitive but cognatively easier for readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the 'signal' of Properties that it automatically says I can only pass [str, str] instead. To me it feels cognitive easier to see Properties than dict[str, str] which has no semantic meaning and less error prone. But I am not really tied up on it.
And yes maybe we could make it into a TypedDict
what do you say?
_STORE_CACHE[alias] = store = ObjectStore(protocol=protocol, conn_id=conn_id, fs=fs) | ||
_STORE_CACHE[alias] = store = ObjectStore(protocol=protocol, conn_id=conn_id, fs=fs, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me uncomfortable that storage options are only effective in the first call to attach
; in later calls they will be silently ignored, even when options in the existing storage object does not match. It would be more explicit (thus less prone to user errors) if storage creation and retrieval should be splitted into two functions instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you have a point. Do you have a suggestion?
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
This enables an alternative ObjectStoragePath init syntax, using the auth section in the URI to supply conn ID instead of a separate keyword argument. The explicit keyword argument is honored if supplied.
8b939ba
to
15a34f9
Compare
Merging this - if we want to adjust the typing and split the attach function we can do that in the follow ups. |
This allows storage options to be passed to the fsspec backend as part of the kwargs to ObjectStorage. --------- Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> (cherry picked from commit aba58ad)
The released providers added support to previous version of the `airflow.io` - where options were not passed to `get_fs` method that provides Fsspec compatible FileSystem. However apache#35820 added positional "options" parameter when the method is called and it broke already released providers. This PR dynamically inspects signature of the get_fs method and when one parameter is detected, it will skip passing options to get_fs method call.
…6186) The released providers added support to previous version of the `airflow.io` - where options were not passed to `get_fs` method that provides Fsspec compatible FileSystem. However #35820 added positional "options" parameter when the method is called and it broke already released providers. This PR dynamically inspects signature of the get_fs method and when one parameter is detected, it will skip passing options to get_fs method call.
…6186) The released providers added support to previous version of the `airflow.io` - where options were not passed to `get_fs` method that provides Fsspec compatible FileSystem. However #35820 added positional "options" parameter when the method is called and it broke already released providers. This PR dynamically inspects signature of the get_fs method and when one parameter is detected, it will skip passing options to get_fs method call. (cherry picked from commit 7799c51)
During the work on PR #1297, an issue arose where the Azure remote manifest task began failing after installing providers and packages with constraints. To allow other tests, which were running successfully, to proceed, the task was temporarily disabled. Upon reviewing the GitHub Actions logs from previous successful runs, it was [observed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11573670971/job/32216315282#step:6:250) that the Azure provider version installed was 10.5.1. However, after the refactoring introduced in the PR, the failing actions [showed](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:6:474) azure provider==8.4.0 being installed with the constraints file. To investigate, I tested locally to identify a working version. While the task failed with 8.4.0, it succeeded with 8.5.0. Analyzing the failure [logs](https://github.com/astronomer/astronomer-cosmos/actions/runs/11911545582/job/33193301710#step:7:467) and reviewing the Azure provider changelog hints that apache/airflow#35820 is potentially the fix for the failure with our connection setup in the CI that was released in 8.5.0. Therefore, I propose using azure provider>=8.5.0. However, the Airflow [constraints file](https://raw.githubusercontent.com/apache/airflow/constraints-2.8.0/constraints-3.8.txt) for version 2.8 specifies azure provider==8.4.0, which conflicts with this requirement. To address this, I am making changes to the pre-install script in our CI to install azure provider>=8.5.0 without relying on the constraints file, citing the reasons above. closes: #1304
This allows storage options to be passed on to the underlying implementation. It does require a release of the providers as the signature of
get_fs
has changed.@kaxil @uranusjr @Taragolis
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.