Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge connection info into existing connection file if it already exists #1133

Merged
merged 6 commits into from
Jul 25, 2023

Conversation

jasongrout
Copy link
Member

@jasongrout jasongrout commented Jul 13, 2023

This PR lets a connection file specify a port as 0 and then be updated when ipykernel picks a port, as a follow-up to #1127.

In #1127, ipykernel was changed to not overwrite a kernel connection file when it already exists. This broke a pattern we used at Databricks (thanks to @MrBago) that addresses the same handshaking issues as jupyter/jupyter_client#487 and jupyter/enhancement-proposals#66 (i.e., JEP 66). Our pattern was this:

  1. The kernel client writes a connection file connect.json like the following, where all the port numbers are zero:
     {
       "shell_port": 0,
       "iopub_port": 0,
       "stdin_port": 0,
       "control_port": 0,
       "hb_port": 0,
       "ip": "*",
       "key": "abc123",
       "transport": "tcp",
       "signature_scheme": "hmac-sha256",
       "kernel_name": "My Kernel"
     }
  2. ipykernel is started, pointing at that connection file.
    python -m ipykernel -f ./connect.json
    
  3. ipykernel/zmq picks random ports for each of the channels
  4. ipykernel (before Check existence of connection_file before writing #1127) overwrites the connection file with the correct port numbers and ip interface address it actually connected to
  5. We read the connection file to see what ports the client needs to use to communicate with ipykernel.
    $ cat connect.json 
    {
      "shell_port": 62757,
      "iopub_port": 62761,
      "stdin_port": 62758,
      "control_port": 62759,
      "hb_port": 62763,
      "ip": "0.0.0.0",
      "key": "abc123",
      "transport": "tcp",
      "signature_scheme": "hmac-sha256",
      "kernel_name": ""
    }
    

In #1127, it was noticed that this overwrote the kernel_name as well. In essence, ipykernel was taking over the file that someone else had created. The solution in #1127 was to not touch the connection file if it exists, but that breaks the usecase of letting the kernel pick ports and interfaces.

This PR instead merges the new information from ipykernel into the connection file. If the file would not change from the updated information, then we don't write to the connection file. In short, with this PR, we get the following in the last step above:

$ cat connect.json 
{
  "shell_port": 62914,
  "iopub_port": 62918,
  "stdin_port": 62915,
  "control_port": 62916,
  "hb_port": 62920,
  "ip": "0.0.0.0",
  "key": "abc123",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": "My Kernel"
}

@jasongrout jasongrout added the bug label Jul 13, 2023
@jasongrout jasongrout force-pushed the connection_file branch 2 times, most recently from 66566c8 to e4929fa Compare July 13, 2023 11:15
@jasongrout jasongrout changed the title Merge info into existing connection files if they already exist WIP: Merge info into existing connection files if they already exist Jul 13, 2023
@jasongrout jasongrout force-pushed the connection_file branch 3 times, most recently from edf708e to bb61b95 Compare July 13, 2023 11:24
@ipython ipython deleted a comment from jasongrout-db Jul 13, 2023
@jasongrout jasongrout changed the title WIP: Merge info into existing connection files if they already exist WIP: Merge connection info into existing connection file if it already exists Jul 13, 2023
Before ipykernel 6.23.3, i.e., before #1127, a kernel manager could specify a channel port of 0, and ipykernel would pick a random port and rewrite the connection file with the actual port used. This provided a nice way to address the natural race condition between a kernel manager picking a port and ipykernel actually connecting to it and using it.

This unit test tests that this port 0 connection file behavior works, and also tests that existing information in the connection file is not overwritten.
This lets a connection file specify a port as 0 and then be updated when ipykernel picks a port
@jasongrout jasongrout changed the title WIP: Merge connection info into existing connection file if it already exists Merge connection info into existing connection file if it already exists Jul 20, 2023
jasongrout and others added 2 commits July 20, 2023 13:24
… errors in CI

zmq.error.ZMQError: Permission denied (addr='tcp://0.0.0.0:53555')
@jasongrout jasongrout marked this pull request as ready for review July 20, 2023 19:27
@jasongrout jasongrout requested a review from blink1073 July 20, 2023 19:27
@jasongrout
Copy link
Member Author

@blink1073 - pinging you for review, since you also reviewed #1127. I'm not sure if this usecase of setting the port number to 0 and having the kernel rewrite the connection file with the actual port was considered in the original discussion on #1127.

We discussed this briefly in the server/kernels meeting today, and we had a generally positive reaction from people there today, and no objections to the idea of updating the connection file.

@jasongrout
Copy link
Member Author

Ping also @fecet, the original author of #1127. This PR implements your idea at #1127 (comment):

... Or can we just read the kernel file if exists, and only update the difference? ....

@jasongrout
Copy link
Member Author

The tests_check CI test is failing because the prerelease test timed out. The prerelease test timed out 99% of the way through, so perhaps it is just at the borderline of the timeout (i.e., it looks to me like this failure is not particular to this PR).

@JohanMabille
Copy link
Contributor

JohanMabille commented Jul 20, 2023

Now that JEP 66 has been approved, will you still need the ability to overwrite the connection file when ipykernel implements the handshake pattern? This latter seems more efficient to me since it does not require to poll the file system.

Copy link
Member

@SylvainCorlay SylvainCorlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes ipykernel's behavior with respect to the mechanism for port selection.
JEP 66 just went through, addressing this very issue. Could we work together on implementing the JEP rather than adding a special workaround in ipykernel?

@fecet
Copy link
Contributor

fecet commented Jul 21, 2023

Ping also @fecet, the original author of #1127. This PR implements your idea at #1127 (comment):

... Or can we just read the kernel file if exists, and only update the difference? ....

great to hear that!

@jasongrout
Copy link
Member Author

jasongrout commented Jul 21, 2023

Now that JEP 66 has been approved, will you still need the ability to overwrite the connection file when ipykernel implements the handshake pattern? This latter seems more efficient to me since it does not require to poll the file system.

I don't know - but that is a hypothetical at this point since it isn't implemented, and the change in 6.23.3 (accidentally) broke long-standing behavior and is breaking actual systems in use right now that rely on that behavior.

This PR changes ipykernel's behavior with respect to the mechanism for port selection.
JEP 66 just went through, addressing this very issue. Could we work together on implementing the JEP rather than adding a special workaround in ipykernel?

The way I see it, there was an accidental regression introduced in #1127, released in a patch release, that changed long-standing behavior that people rely on right now in ipykernel. This is breaking systems that rely on ipykernel right now. We should have brought this regression to @fecet's attention in #1127, but hey, systems are complicated, and it's not clear anyone recognized this regression at the time, so here we are.

This PR does not change ipykernel's behavior, rather it restores long-standing behavior to what it was before the regression. Of course, another route would be to revert #1127, but I think this PR preserves the intent of #1127 without this regression, and happily aligns with one of @fecet's original suggestions for #1127.

I think once we've resolved this regression, we can have a discussion about whether this port 0 behavior should be superseded by JEP 66's ideas, but of course that succession should happen deliberately, with a period for migration, only after JEP 66 has been implemented.

@SylvainCorlay
Copy link
Member

Let's discuss how we can resolve the issue caused by the change of behavior in #1127.

I think once we've resolved this regression, we can have a discussion about whether this port 0 behavior should be superseded by JEP 66's ideas, but of course that succession should happen deliberately, with a period for migration, only after JEP 66 has been implemented.

The idea of not passing ports and polling the file system to wait for the info to be populated by the kernel was addressed in the discussion about the JEP.

@jasongrout
Copy link
Member Author

Let's discuss how we can resolve the issue caused by the change of behavior in #1127.

Thank you. I agree - I think this PR discussion should be about fixing the regression in behavior introduced in #1127. Here are, for example, two paths forward:

  1. Revert Check existence of connection_file before writing #1127 (if this is going to be a long discussion blocking merging of this PR, I'd advocate for reverting to unbreak things in the short term)
  2. Merge this PR, which achieves the aims of Check existence of connection_file before writing #1127 while preserving long-standing ipykernel behavior

I prefer merging this PR, as it builds on #1127 (which did solve an important issue). Which do you prefer, or do you see another path forward that fixes the behavior change in #1127 in the short term?

The idea of not passing ports and polling the file system to wait for the info to be populated by the kernel was addressed in the discussion about the JEP.

Great! I think there are other places to discuss JEP 66 - let's not rehash a JEP 66 discussion here.

@blink1073
Copy link
Contributor

I'm afk until next Monday, feel free to do whatever you all think is best with this PR.

@ccordoba12
Copy link
Member

The way I see it, there was an accidental regression introduced in #1127, released in a patch release, that changed long-standing behavior that people rely on right now in ipykernel. This is breaking systems that rely on ipykernel right now. We should have brought this regression to @fecet's attention in #1127, but hey, systems are complicated, and it's not clear anyone recognized this regression at the time, so here we are.

This PR does not change ipykernel's behavior, rather it restores long-standing behavior to what it was before the regression

Given this analysis of the situation that Jason provided, I agree that we should either merge this PR or revert #1127 (preferable?) and release a new bugfix version to restore the previous behavior.

Just my 2 cents.

@fecet
Copy link
Contributor

fecet commented Jul 21, 2023

If #1127 be revoked I hope this jupyter/notebook#6936 can also be considered.

@jasongrout
Copy link
Member Author

@SylvainCorlay - after this discussion, do you still have an objection to this PR going in?

@blink1073
Copy link
Contributor

I'm going to merge this to fix the current bug, we can address JEP 66 as a follow-up.

@blink1073 blink1073 merged commit 09c3c35 into main Jul 25, 2023
@blink1073 blink1073 deleted the connection_file branch July 25, 2023 16:11
@SylvainCorlay
Copy link
Member

I have a reservation with respect to the special casing of the port number "0" to signal that ports have to be chosen. (this behavior is not really part of the protocol. If this fixes your scenario, I guess this is fine as a temporary fix.

@jasongrout-db
Copy link

jasongrout-db commented Jul 26, 2023

I have a reservation with respect to the special casing of the port number "0" to signal that ports have to be chosen. (this behavior is not really part of the protocol. If this fixes your scenario, I guess this is fine as a temporary fix.

Great question about the port 0 convention. I looked into it in preparation for making this PR to see how widespread that convention is and if it looked like this behavior in ipykernel was intentional. Here's what I found:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants