-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyter Enterprise gateway #25
Merged
Merged
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
140 changes: 140 additions & 0 deletions
140
...er-enterprise-gateway-incorporation/jupyter-enterprise-gateway-incorporation.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
# Jupyter Enterprise Gateway Incorporation Proposal | ||
|
||
## Problem | ||
|
||
Founded in academia, the Jupyter projects provide a rich and highly popular set of applications for | ||
interacting with and iterating on large and complex applications. It has been truly ground-breaking. | ||
However, when we first attempted to build a Notebook service that could enable a large number of data | ||
scientists to run frequent and large workloads against a large Apache Spark cluster, we identified | ||
several requirements that were not currently available in the Jupyter open source ecosystem. We tried | ||
to use the Jupyter Kernel Gateway project, but we quickly realized that the JKG server became the | ||
bottleneck because the co-located Spark driver application for these kinds of workloads (in this case, | ||
the kernel process running on behalf of notebook cells) were extremely resource intensive. In organizations | ||
with multiple data scientists, you can quickly saturate the compute resources of the Kernel Gateway server. | ||
|
||
Jupyter Enterprise Gateway enables Jupyter Notebook to launch and manage remote kernels in a distributed cluster, | ||
including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes. New platforms can be added via an | ||
extensibility layer that would handle specific capabilities of the underlying cluster manager. | ||
|
||
## Proposed Enhancement | ||
|
||
The incubating [Jupyter Enterprise Gateway project](https://github.com/jupyter-incubator/enterprise_gateway) has | ||
matured to the point where it addresses the issues noted above, and others. It should be considered for incorporation | ||
into the main Jupyter organization as an official Subproject. | ||
|
||
## Detailed Explanation | ||
|
||
Please see below detailed project information. | ||
|
||
### Current and Potential Use Cases | ||
|
||
* Provision and manage kernels in a remote cluster. | ||
* Support kernels to be launched as a given user enabling multi-tenancy. | ||
|
||
### Current Features | ||
|
||
Jupyter Enterprise Gateway is a web server that provides headless access to Jupyter kernels within | ||
an enterprise. Built directly upon Jupyter Kernel Gateway, Jupyter Enterprise Gateway leverages all | ||
of the Kernel Gateway functionality in addition to the following: | ||
|
||
* Adds support for remote kernels hosted throughout the enterprise where kernels can be launched in | ||
the following ways: | ||
* Local to the Enterprise Gateway server (today's Kernel Gateway behavior) | ||
* On specific nodes of the cluster utilizing a round-robin algorithm | ||
* On nodes identified by an associated resource manager | ||
* Provides support Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes out of the box. | ||
Others can be configured via Enterprise Gateway's extensible framework. | ||
* Secure communication from the client, through the Enterprise Gateway server, to the kernels | ||
* Multi-tenant capabilities | ||
* Ability to associate profiles consisting of configuration settings to a kernel for a given user | ||
* Persistent kernel sessions | ||
|
||
Below are some more details on the supported cluster platforms and specific capabilities: | ||
|
||
#### Distributed Kernels in Apache Spark | ||
|
||
Jupyter Enterprise Gateway enables Jupyter Notebook to launch and manage remote kernels in a distributed cluster. It | ||
leverages different resource managers to enable distributed kernels in Apache Spark clusters. One example shown below | ||
describes kernels being launched in YARN cluster mode across all nodes of a cluster. | ||
|
||
|
||
![Jupyter Enterprise Gateway leverages Apache Spark resource managers to distribute kernels](jupyter_enterprise_gateway.gif) | ||
|
||
Note that, Jupyter Enterprise Gateway also provides some other value added capabilities such as : enhanced security and multiuser support with user impersonation. | ||
|
||
![Jupyter Enterprise Gateway provides Enhanced Security and Multiuser support with user Impersonation](jupyter_enterprise_gateway_on_yarn.png) | ||
|
||
#### Distributed Kernels in Kubernetes | ||
|
||
Jupyter Enterprise Gateway support for Kubernetes enables decoupling the Jupyter Notebook Server and its kernels into multiple pods. This enables running Notebook server pods with minimally necessary resources based on the workload being processed. | ||
|
||
![Jupyter Enterprise Gateway enable remote kernels on Kubernetes cluster](jupyter_enterprise_gateway_on_kubernetes.png) | ||
|
||
|
||
## Criteria for Incorporation | ||
|
||
### Have an active developer community that offers a sustainable model for future development. | ||
|
||
The enterprise gateway reuses and extends classes from the Jupyter `kernel_gateway` and `notebook` Python package. By virtue of this implementation, it is largely sustained by development of the `jupyter/notebook` project. Minimal maintenance is required to ensure the enterprise gateway codebase continues to interoperate with future releases of the `notebook` package. | ||
|
||
### Have an active user community. | ||
|
||
Enterprise gateway is a fundamental component in multiple IBM Cloud offerings, and has also been adopted in a few large companies that are providing Analytical and/or AI platform for it's internal/external customers. | ||
|
||
Other then that, below are some stats that have been collected from the Jupyter Enterprise Gateway GitHub repository from October 14th 2017 - current: | ||
|
||
- 7 releases | ||
- 10 contributors | ||
- 5 different organizations (based on current employment) | ||
- 205 commits (16,551 additions, 9,616 removals) | ||
- 60 Stars | ||
- 26 Forks | ||
- 10K+ pulls of primary docker image | ||
|
||
### Use solid software engineering with documentation and tests hosted with appropriate technologies. | ||
|
||
The Enterprise Gateway has a suite of unit and integration tests that are run automatically on Travis on every PR and any commits to master. | ||
|
||
The Jupyter Enterprise Gateway community provides multiple resources that both users and contributors can use: | ||
|
||
- Source Code available at GitHub: https://github.com/jupyter-incubator/enterprise_gateway | ||
- Documentation available at ReadTheDocs: http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ | ||
- Automated builds available at Travis.CI: https://travis-ci.org/jupyter-incubator/enterprise_gateway | ||
- Releases available at PyPi.org: https://pypi.org/project/jupyter_enterprise_gateway/ | ||
- Releases available at Conda Forge: https://github.com/conda-forge/jupyter_enterprise_gateway-feedstock | ||
- Related Docker Images available at Elyra organization at DockerHub: https://hub.docker.com/u/elyra/dashboard/ | ||
|
||
|
||
### Demonstrate continued growth and development. | ||
|
||
See "Have an active developer community that offers a sustainable model for future development" and "Have an active user community" sections above. | ||
|
||
### Integrate well with other official Subprojects. | ||
|
||
The Enterprise Gateway is a `jupyter/jupyter_core#Application` that uses programmatic APIs from `jupyter/notebook` to enable communication with Jupyter kernels like `ipython/ipykernel`. By definition, it integrates with other official Subprojects. | ||
|
||
We are also looking for investigating deep integration with `JupyterHib` to decouple the kernel instances into specific pods in a kubernetes environment. | ||
|
||
### Be developed according to the Jupyter governance and contribution model. | ||
|
||
The Enterprise Gateway is in the Jupyter Incubator, and under the Jupyter governance and contribution model since its inception. | ||
|
||
### Have a well-defined scope. | ||
|
||
Jupyter Enterprise Gateway enables Jupyter Notebook to launch remote kernels in a distributed cluster, including Apache Spark managed by YARN, IBM Spectrum Conductor or Kubernetes. | ||
|
||
### Be packaged using appropriate technologies such as pip, conda, npm, bower, docker, etc. | ||
|
||
The Enterprise Gateway is packaged using setup tools, released in both source and wheel format on PyPI, and installable using `pip`. It is also available in conda forge. | ||
|
||
## Pros and Cons | ||
|
||
Pro: Extend Jupyter Stack to support distributed/remote Kernels | ||
|
||
Pro: The runtime can easily be extensible to support new cluster resource managers | ||
|
||
Con: Still requires couple extensions (e.g. NB2KG) to connect to the gateway | ||
|
||
## Interested Contributors | ||
|
||
@parente, @rgbkrk |
Binary file added
BIN
+916 KB
jupyter-enterprise-gateway-incorporation/jupyter_enterprise_gateway.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+35.9 KB
...r-enterprise-gateway-incorporation/jupyter_enterprise_gateway_on_kubernetes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+69.4 KB
jupyter-enterprise-gateway-incorporation/jupyter_enterprise_gateway_on_yarn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is more about -- how is the contributor base expanding beyond that of the initiators of the project? How has that been going? It's a different question than number of users and variance of users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback @rgbkrk, I have pushed a new commit expanding on that section, please let me know if you think we should add any more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking great, thanks for coming back to it.