Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate usage of extra[host] in AWS's connection #25494

Merged
merged 1 commit into from
Aug 8, 2022

Conversation

gmcrocetti
Copy link
Contributor

Deprecate extra's host argument to favor connection's host.
closes: #17833

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Aug 3, 2022
@Taragolis
Copy link
Contributor

My personal thought both of Connection.host and Connection.extra['host'] not a good candidate for endpoint_url.

I would rather deprecate Connection.extra['host'] by Connection.extra['endpoint_url']:

  1. In this case it would be easier implement The AWS Secrets Backends (SM and SSM) do not allow configuration of Assume Role Methods via Backend Kwargs #25326
  2. keep same name as it use in boto3.client and boto3.resource

Please also note that connection which read from URI (Environment Variables, SSM backend and might be other sources) would always contain non-None host. In other hand It might be an issue if we pass endpoint_url='' rather than endpoint_url=None to boto3

from airflow.models.connection import Connection

conn = Connection(uri="aws://")
assert conn.host is None, f"Expected None but got {conn.host!r}"

Also feel free to join discussion in this PR #25416 which basically hide unused connections fields from UI, since it not used in AwsConnectionWrapper and we do not provide host attribute as compatibility for Connection object.

@gmcrocetti
Copy link
Contributor Author

gmcrocetti commented Aug 3, 2022

Hi @Taragolis. I do agree that endpoint_url is way more intuitive than host, as thus we should deprecated both host and extra["host"] toward a more lean alternative. The problem is this is my biased developer POV and I don't know how airflow's community will react to that (they're used with the host field after all).
I'll mark @potiuk to ask for some guidance. I'm ok with either closing this PR (and the related issue) or working in the endpoint_url suggestion, just need a sync.

@Taragolis
Copy link
Contributor

You've never know end users reaction. Someone frustrate when they see Deprecation and User Warnings and tried to fix asap, some other still use operators/sensors/hooks from airflow.contrib 😃

@gmcrocetti
Copy link
Contributor Author

You've never know end users reaction. Someone frustrate when they see Deprecation and User Warnings and tried to fix asap, some other still use operators/sensors/hooks from airflow.contrib smiley

I think that renaming host to endpoint_url and extra[host] to extra[endpoint_url] covers all cases, WDYT @Taragolis ?
as suggested by @josh-fell

@Taragolis
Copy link
Contributor

I'm still worry of fact that 'host' has additional logic on URI parse which could add additional complexity - in the fact it only happen if user do not follow documentation and do not URL-encoded their values

def _parse_netloc_to_hostname(uri_parts):
"""Parse a URI string to get correct Hostname."""
hostname = unquote(uri_parts.hostname or '')
if '/' in hostname:
hostname = uri_parts.netloc
if "@" in hostname:
hostname = hostname.rsplit("@", 1)[1]
if ":" in hostname:
hostname = hostname.split(":", 1)[0]
hostname = unquote(hostname)
return hostname

And a little clarification just for sure that we on the same page. Not just renaming, we deprecate value just for sure that users have a time to change their connections before actually this parameter would be removed. Something like that

self.endpoint_url = extra.get("endpoint_url")
if not self.endpoint_url and "host" in extra:
    self.endpoint_url = conn.host
    warnings.warn(
        "some deprecation warning",
        DeprecationWarning,
        stacklevel=2,
    )

And also it would be nice to add this in to the documentation

@Taragolis
Copy link
Contributor

BTW, I think specify endpoint_url into boto3.client and boto3.resource only required for MinIO, Aliyun OSS users and might be for LocalStack and moto in standalone mode.

@gmcrocetti gmcrocetti force-pushed the 17833-aws-connection branch 2 times, most recently from f12f477 to 1d3a3a9 Compare August 6, 2022 18:39
@gmcrocetti
Copy link
Contributor Author

gmcrocetti commented Aug 6, 2022

The original goal of this PR was to allow one using Connection.host, as described in #17833, but after discussions it occurred me host is not semantically correct for the proposed use case.
As we all know, host is just one part of a URL in the HTTP scheme. As boto expects a complete HTTP url, host is not the most correct description for this parameter and as a consequence the reason I'm proposing renaming it to endpoint_url.
Please, let me know if that doesn't make any sense.

@gmcrocetti gmcrocetti marked this pull request as ready for review August 6, 2022 19:02
@gmcrocetti gmcrocetti changed the title AwsBaseHook not using Connection's host Deprecate usage of extra[host] in AWS's connection Aug 6, 2022
@Taragolis
Copy link
Contributor

@gmcrocetti

Since #25416 merged would be nice also update placeholder for UI:

"aws_session_token": "AQoDYXdzEJr...EXAMPLETOKEN",
"host": "http://localhost:4566",
},

but after discussions it occurred me host is not semantically correct for the proposed use case.

I just thought that we could inform users if they somehow use Connection.host (might be defined in current version of provider) that this option won't work. Same as we inform that if they use profile which related to deprecated value, and do not related to actual profile_name but it just an idea.

https://github.com/apache/airflow/blob/2e2e86d9e43989ed039afc07fa8efe29bf5d170c/airflow/providers/amazon/aws/utils/connection_wrapper.py#L138-L147

@potiuk
Copy link
Member

potiuk commented Aug 7, 2022

I just thought that we could inform users if they somehow use Connection.host (might be defined in current version of provider) that this option won't work. Same as we inform that if they use profile which related to deprecated value, and do not related to actual profile_name but it just an idea.

Yeah. Might be a good idea to add warning if host is set Host {} specifed in connection.host is not used - the AWS host should be bassed by "endpoint_url" extra ... Or smth.

@potiuk
Copy link
Member

potiuk commented Aug 8, 2022

I want to make provider's release today - woudl be great to get this in.

@@ -154,6 +154,16 @@ def __post_init__(self, conn: "Connection"):

self.endpoint_url = extra.get("host")

if self.endpoint_url:
warnings.warn(
"extra['host'] is deprecated and will be removed in a future release."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potuik If the Amazon provider is going to be a major version bump this month, is the deprecation message even required? The provider could just stop supporting the host connection field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That's a bit aggressive, but we could raise plain error here. I do not see this as being a major problem. But removing it altogether is not a good idea because it would likely stop doing that it was doing before without even a warning in some cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. So more of a release flow of feature -> deprecate/message about not using feature -> remove feature regardless of major version bump along the way.

@gmcrocetti
Copy link
Contributor Author

Just sent a patch addressing both @Taragolis and @potiuk comments

…nnection. Depreciation is happening in favor of 'endpoint_url' in extra.

The Connection.host column is discarded due to lack of semantics as it's easier and more intuitive to pass the whole url than filling its pieces in the UI.
Update documentation stating 'host' is deprecated in favor of 'endpoint_url'. Also updated placeholders on the UI.
Usage of Connection.host display a deprecation warning.
@gmcrocetti
Copy link
Contributor Author

The very same check that failed here is failing on main. Can we move forward with this one or fix main before ?

@potiuk potiuk merged commit a0212a3 into apache:main Aug 8, 2022
@potiuk
Copy link
Member

potiuk commented Aug 8, 2022

Yeah I think there was some broken image generation for breeze. Will fix it in a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AwsBaseHook isn't using connection.host, using connection.extra.host instead
4 participants