Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Add a Session Management infrastructure for extension developers #122

Open
echarles opened this issue Oct 21, 2019 · 11 comments
Open
Milestone

Comments

@echarles
Copy link
Member

For various usecases (Realtime collaboration, Multiuser, Kernel Gateway HTTP Personality), it will be needed at some point to have a Session Object binded to the HTTP client (browser or software).

In other languages, this is typcially done with a session cookie.

Not sure how Tornado implements this.

https://pypi.org/project/torndsession/
https://github.com/cole/tornado-sessions

@echarles
Copy link
Member Author

echarles commented Oct 21, 2019

Background discussions

HTTP Personality for Enterprise Gateway jupyter-server/enterprise_gateway#734 (comment)
@kevin-bates mentioned the need for session object during a jupyter server meeting
@Zsailer mentioned his aim to have one day multi user server

@rolweber
Copy link
Contributor

rolweber commented Oct 21, 2019

During the Jupyter server meeting in May, I suggested labels as a way to enable collaboration between members of a trusted team on a shared Jupyter server. The idea is that the UI (client side) could label the kernels and/or kernel sessions with the user that created them. This would suffice for use cases like "stop all my kernels". A true multi-user server with isolation between users is a much harder nut to crack.

Could you be a bit more specific about use cases that require the server to track browser/client sessions?

@rolweber
Copy link
Contributor

With labels, I mean a generic name-value mechanism, like in Kubernetes.
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

@echarles
Copy link
Member Author

@rolweber A Web Session would allow e.g. to support other auth paradigm (see discussion on #50). Especially with oauth and oidc, a session is useful to store intermediate tokens and states. Rather than adhoc solution like labels for RTC, it would bring a stronger foundation to any multiuser cases.

@rolweber
Copy link
Contributor

OAuth doesn't require a session. The authorization token itself is sent with every request, and can be validated on its own. OIDC builds on OAuth, I'm pretty sure it doesn't require a session either. Those modern protocols are designed for scalability, and sessions are an antipattern for scalability. REST mandates stateless processes. 12 Factor mandates stateless processes.

I know that our situation is a bit different, because kernels are stateful by definition. And the Jupyter Server needs to keep track of the running kernels, and maybe kernel sessions. But this management information can be externalized, to keep the Jupyter Server process itself stateless and allow for crash recovery. Labels attached to the managed objects can be externalized in the same fashion. Open WebSockets are a kind of state too, but those can fail and the client will simply re-connect, possibly through a restarted Jupyter Server.

Client sessions in the Jupyter Server are a different matter. They add state that would need to be externalized separately, independent of the objects that the Jupyter Server has to manage already. Can you propose a use case that is more specific than "might be useful"?

I don't have a problem with an optional auth handler that implements a token cache for its purposes, and maybe even sets a cookie if that is needed on top of the token. Anyone would be free to configure a different auth handler and run a Jupyter Server that works without cookies. But you seem to be asking for a generic mechanism for session management, based on the assumption that "it will be needed at some point". On that generic level, my counter argument is YAGNI. Sessions are an antipattern, so we should strive to avoid them, rather than implement a generic mechanism that will tempt developers to use it just "because it's there".

@echarles
Copy link
Member Author

For security reasons, once authenticated, you can put in your server session your profile (you may not want to send complete profile to client). You may also persist there the allowed actions based on your profile and filter on server side on which action the user is requesting.

For performance reasons, you can pull from an external source a larger set of information (e.g. gimme the 1000 latests comments a user has done) and work with that in memory to deliver the comments that fullfill the search criteria.

For customizability reasons, users could have services instances created very specifically and that would be available in their session object.

For collaboration purpose, Session is ideal to add users an keep state on their connections (read only...). I have read your proposal to annotate the kernel sessions, but a server can and should be able to live without kernels, e.g. for the content API.

@rolweber
Copy link
Contributor

I'm not saying we should prevent extensions from managing sessions, if they have a case for it. And if it's a feature found generally useful for extensions, then adding it to Jupyter Server might be a good idea. So, are there extensions, existing or in development, that require sessions? In particular, sessions tracked by the server with a cookie?

Managing user profiles doesn't sound like something Jupyter Server should be doing out of the box. And if it does, the profiles should be managed by user, not by session. Same for your comment search example - why should that be managed as part of a session, instead of a cache by user? If you're creating customized service instances, then keep the custom information with the service instance, not in a session.

If you want to make a case for multi-user support, then maybe we should discuss that under the topic of multi-user support, and decide later what kind of session tracking addresses the requirements best. But as far as I know, Jupyter Server as yet has no means to isolate users. By default, it still starts kernels on the same node, with the same operating system user as the server itself. And then lets users send code for execution, which can connect to all ports on the node, and mess with all processes running as the same operating system user. If that is still the case, I see two ways of working with Jupyter Server in a multi-user scenario:

  1. Start a separate, isolated server for each user. (JupyterHub)
  2. Run a Jupyter Server for a group of users that trust each other completely.

In the first case, Jupyter Server doesn't have to distinguish between users, because it's running for only one. User-specific information can be managed as global for each server instance.
In the second case, Jupyter Server should not pretend to distinguish between users, because it has no safe and secure way to isolate them. The clients should distinguish between users, and between sessions of the same user. The server can support such use cases by implementing resources that represent users and sessions. Instead of quietly setting a cookie, let the clients request a session if they need one, and make them send the session ID explicitly in subsequent requests.

But that's just my opinion. I'll wait for others to share theirs.

@echarles
Copy link
Member Author

I have rephrased the title from Add a Session to Add a Session Management infrastructure for extension developers

@echarles echarles changed the title [DISCUSS] Add a Session [DISCUSS] Add a Session Management infrastructure for extension developers Oct 22, 2019
@kevin-bates
Copy link
Member

let the clients request a session if they need one, and make them send the session ID explicitly in subsequent requests.

👍 isolation is a tough nut to crack, but until we have the ability to distinguish activities tied to a client we can't really organize multi-tenant support.

In either case, we should try to define what kinds of "services" are available in headless operations. If we wanted to expose Content Services, we'd probably need to have a "manager" introduced that spans Content Manager instances - similar to the MappingKernelManager. My thought was that these kinds of mapping managers would get indexed by something like a session id (although kernel-id is used in the MappingKernelManager).

@vidartf
Copy link
Member

vidartf commented Oct 23, 2019

But that's just my opinion. I'll wait for others to share theirs.

I haven't thought fully through all these cases, but I agree that:

  1. We should start with identifying the goals (RTC, Multiuser, etc.).
  2. Identifying any shared requirements between them (does any actually exist, or are they all actually different even if they share some overlap?).
  3. Identifying the best way to fulfill those requirements on the jupyter server platform.
  4. Implementing needed parts in either jupyter server, and/or as extensions

I feel the initial issue was a little thin on some of these (especially pt2). This might be obvious to some, but people like me might need these things spelled out 😉

Note that some of this also overlaps with the discussion in jupyterlab/frontends-team-compass#11 . I feel like maybe this issue was meant to side step that discussion?

@Zsailer
Copy link
Member

Zsailer commented Oct 23, 2019

Great discussion here. Thank you @echarles for opening up the conversation.

I agree with @vidartf's breakdown. I (personally) need to reason through the various goals/use-cases a lot more before making any strong opinions about how to properly manage user identity, authorization, sessions, etc. in the jupyter_server.

I mentioned this briefly in our last Jupyter Server meeting—I had previously been noodling around with a Jupyter Server implementation that includes authorization.

  • It's a Jupyter Server with only the Contents API.
  • I've patched in an authorization later for each action in the Contents API.
  • It's been adapted into a JupyterHub-managed service.
  • User authentication is handled by the Hub.
  • A user's authorization policy is sourced from a policy file (but will eventually come from the JupyterHub authenticators).
  • Based on the user's allowed actions (or role) the server will allow or block requests.

You can think of this server as a "shared drive" for users inside JupyterHub. I'm planning to expand this service to check authorization for other services (i.e. kernels, terminals, etc.).

This provides a mechanism for multi-user access to a jupyter server. How this translates to RTC, I'm not sure. This is really experimental right now, but I could see this thin authorization layer making it into Jupyter Server in the future. It will probably require a JEP though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants