-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind log server on worker to IPv6 address (#24755) #24846
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
# Gunicorn can bind multiple addresses, see https://docs.gunicorn.org/en/stable/settings.html#bind. | ||
options = [ | ||
GunicornOption("bind", f"0.0.0.0:{worker_log_server_port}"), | ||
GunicornOption("bind", f"[::]:{worker_log_server_port}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause any problems if ipv6 isn't enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it breaks the gunicorn startup with a failure. Below more details #24846 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. looked up some issues and seems that this benoitc/gunicorn#1138 could lead to some problems in the past (but likely it's solved already as it is very old). So worth double checking if this works in different combos of v4/v6.
Awesome work, congrats on your first merged pull request! |
Hello! We have Airflow version 2.3.4 and this is not working if in the machine IPv6 is not available or disabled. We end up having an error during worker startup and the IPv4 bind doesn't even start because the entire gunicorn process fails. The bind should be configurable and not hardcoded in the code.
|
Feel free to make a PR to change it @saimon46. It is in Looking forward to your PR - happy to review it. |
@potiuk I perfectly agree on this. I'm going to work on a PR but before I start I'm writing here 2 possible solutions for this problem:
What do you thing would be the best? The first gives more flexibility to the sysadmin same as the second, the third is automatic yes, but still you loose in configuration flexibility. |
The best will be handle error gracefully simply rather than fail (this was the original intention and we believed it worked this way).. Adding new configuration should only be neeeded if absolutely necessary. |
There is no particular reason why you should configure it - you can configure it by enabling/disabling the IPV6/IPV4 in the deployment of yours and Airflow should use whatever is available. There is no point in having two places where you can configure it (deployment and airflow). |
FYI also affected by this. @potiuk Are you suggesting a preferred PR would be to catch this exception and try binding to IPv4 only? To me it looks like Gunicorn only throws a Possible to catch |
Then it needs proper fix and detection. Just make it work automatically (whatever it takes) |
One approach I guess is to test if it's possible to bind to IPv6? I played around with the socket library and came up with this test (tested on both machines that can bind and can not bind to IPv6): import socket
def ipv6_bindable(port=8000):
# Couldn't find a machine where "socket.has_ipv6" returns False
# but thought it worth adding as if it did return False it may throw
# a different exception when trying to open a socket with family=socket.AF_INET6
if not socket.has_ipv6:
return False
try:
socket.socket(family=socket.AF_INET6).bind(('::', port))
except OSError as e:
if e.args[0] == 97: # Address family not supported by protocol
return False
raise
return True |
Just make a PR :) discusing over the code where we can comment it directly and you can iterate is way better than disucssing copy&pasted code without context. The worst that is going to happen, we will agree to change the approach, but by doing "real code" change you will see immediately if it works, how it fits in etc. |
I confirm that the test to check that IPv6 is bindable is the best approach. In our case the function that @notatallshaw-work posted works fine. Around that function we should add or not the IPv6 bind in the gunicorn options. |
I am discussing with my company on the process to contribute to open source projects, we would like to but we just need to make sure we get the policy right on our end. It may be a week or so until we have it ironed out, feel free to use any code I've written here in any way to submit a PR yourself if you would like it to land quicker than I can provide one. Also I was playing around with a more generic version, with the idea that someone in the future might have only IPv6 enabled and IPv4 disabled: import socket
def get_bindable_addresses(port=8000):
address_list = []
addrinfo = socket.getaddrinfo(None, port, 0, socket.SOCK_STREAM, 0, socket.AI_PASSIVE)
for addr in addrinfo:
family = addr[0]
test_address = addr[-1][:2]
try:
socket.socket(family=family).bind(test_address)
except OSError as e:
print(f'WARN: Failed to bind {test_address} using IP family {family}: {e!s}')
continue
if family == socket.AF_INET:
address_list.append(f'{test_address[0]}:{test_address[1]}')
elif family == socket.AF_INET6:
address_list.append(f'[{test_address[0]}]:{test_address[1]}')
else:
print(f'WARN: Unrecognized IP address family {family!r}')
return address_list
bindable_addresses = get_bindable_addresses()
if not bindable_addresses:
raise RuntimeError(...)
... But maybe that's overkill, whatever you think is best if you submit a PR. |
No worries. It can wait :). One benefit of the "auto-configuration" is that it might be treated as a bug-fix so it might be added any time (and I anticipate one-two bugfix releases following 2.4.0 in a few weeks time. I believe when you bind to IPv6 address and IPV4 is there, the OS will automatically map IPV4 port to IPV6 port (unless you specify So I think we can simplify it all by simply binding to IPV6 if it is bindable and IPV4 if not. I think we cannot "just" bind to IPV6 because if - for whatever reason - IPV6 support is not enabled at all, it will fail. |
This is all new to me, but yes I believe that's the case as long as the OS supports dualstack and IPV6_V6ONLY has not been set on the socket options (which searching through Gunicorns repo I never see this option):
Thus it would be actually simpler to change the code to: if socket.has_dualstack_ipv6():
bind_option = GunicornOption("bind", f"[::]:{worker_log_server_port}")
else:
bind_option = GunicornOption("bind", f"0.0.0.0:{worker_log_server_port}")
options = [
bind_option,
GunicornOption("workers", 2),
] Note though that |
Bind log server on worker to IPv6 address (#24755)
The worker(s) run a gunicorn web server that serves task logs,
which are displayed on the web-ui (the
Logs
).Until now the log server did bind the address
0.0.0.0:<port>
, seeairflow/utils/serve_logs.py
.In a Kubernetes cluster that allows IPv6 traffic only, the worker logs are not reachable,
because the log server does not listen on
[::]:<port>
.Therefore we bind another address and listen to both IPv4 and IPv6 traffic.
closes: #24755