-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge connection info into existing connection file if it already exists #1133
Conversation
66566c8
to
e4929fa
Compare
edf708e
to
bb61b95
Compare
Before ipykernel 6.23.3, i.e., before #1127, a kernel manager could specify a channel port of 0, and ipykernel would pick a random port and rewrite the connection file with the actual port used. This provided a nice way to address the natural race condition between a kernel manager picking a port and ipykernel actually connecting to it and using it. This unit test tests that this port 0 connection file behavior works, and also tests that existing information in the connection file is not overwritten.
This lets a connection file specify a port as 0 and then be updated when ipykernel picks a port
bb61b95
to
8427f6d
Compare
5793e99
to
4c869ba
Compare
… errors in CI zmq.error.ZMQError: Permission denied (addr='tcp://0.0.0.0:53555')
for more information, see https://pre-commit.ci
@blink1073 - pinging you for review, since you also reviewed #1127. I'm not sure if this usecase of setting the port number to 0 and having the kernel rewrite the connection file with the actual port was considered in the original discussion on #1127. We discussed this briefly in the server/kernels meeting today, and we had a generally positive reaction from people there today, and no objections to the idea of updating the connection file. |
Ping also @fecet, the original author of #1127. This PR implements your idea at #1127 (comment):
|
The tests_check CI test is failing because the prerelease test timed out. The prerelease test timed out 99% of the way through, so perhaps it is just at the borderline of the timeout (i.e., it looks to me like this failure is not particular to this PR). |
Now that JEP 66 has been approved, will you still need the ability to overwrite the connection file when ipykernel implements the handshake pattern? This latter seems more efficient to me since it does not require to poll the file system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes ipykernel's behavior with respect to the mechanism for port selection.
JEP 66 just went through, addressing this very issue. Could we work together on implementing the JEP rather than adding a special workaround in ipykernel?
great to hear that! |
I don't know - but that is a hypothetical at this point since it isn't implemented, and the change in 6.23.3 (accidentally) broke long-standing behavior and is breaking actual systems in use right now that rely on that behavior.
The way I see it, there was an accidental regression introduced in #1127, released in a patch release, that changed long-standing behavior that people rely on right now in ipykernel. This is breaking systems that rely on ipykernel right now. We should have brought this regression to @fecet's attention in #1127, but hey, systems are complicated, and it's not clear anyone recognized this regression at the time, so here we are. This PR does not change ipykernel's behavior, rather it restores long-standing behavior to what it was before the regression. Of course, another route would be to revert #1127, but I think this PR preserves the intent of #1127 without this regression, and happily aligns with one of @fecet's original suggestions for #1127. I think once we've resolved this regression, we can have a discussion about whether this port 0 behavior should be superseded by JEP 66's ideas, but of course that succession should happen deliberately, with a period for migration, only after JEP 66 has been implemented. |
Let's discuss how we can resolve the issue caused by the change of behavior in #1127.
The idea of not passing ports and polling the file system to wait for the info to be populated by the kernel was addressed in the discussion about the JEP. |
Thank you. I agree - I think this PR discussion should be about fixing the regression in behavior introduced in #1127. Here are, for example, two paths forward:
I prefer merging this PR, as it builds on #1127 (which did solve an important issue). Which do you prefer, or do you see another path forward that fixes the behavior change in #1127 in the short term?
Great! I think there are other places to discuss JEP 66 - let's not rehash a JEP 66 discussion here. |
I'm afk until next Monday, feel free to do whatever you all think is best with this PR. |
Given this analysis of the situation that Jason provided, I agree that we should either merge this PR or revert #1127 (preferable?) and release a new bugfix version to restore the previous behavior. Just my 2 cents. |
If #1127 be revoked I hope this jupyter/notebook#6936 can also be considered. |
@SylvainCorlay - after this discussion, do you still have an objection to this PR going in? |
I'm going to merge this to fix the current bug, we can address JEP 66 as a follow-up. |
I have a reservation with respect to the special casing of the port number "0" to signal that ports have to be chosen. (this behavior is not really part of the protocol. If this fixes your scenario, I guess this is fine as a temporary fix. |
Great question about the port 0 convention. I looked into it in preparation for making this PR to see how widespread that convention is and if it looked like this behavior in ipykernel was intentional. Here's what I found:
|
This PR lets a connection file specify a port as 0 and then be updated when ipykernel picks a port, as a follow-up to #1127.
In #1127, ipykernel was changed to not overwrite a kernel connection file when it already exists. This broke a pattern we used at Databricks (thanks to @MrBago) that addresses the same handshaking issues as jupyter/jupyter_client#487 and jupyter/enhancement-proposals#66 (i.e., JEP 66). Our pattern was this:
connect.json
like the following, where all the port numbers are zero:ip
interface address it actually connected toIn #1127, it was noticed that this overwrote the kernel_name as well. In essence, ipykernel was taking over the file that someone else had created. The solution in #1127 was to not touch the connection file if it exists, but that breaks the usecase of letting the kernel pick ports and interfaces.
This PR instead merges the new information from ipykernel into the connection file. If the file would not change from the updated information, then we don't write to the connection file. In short, with this PR, we get the following in the last step above: