Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional-components/jupyterhub-stop-idle #389

Closed
wants to merge 1 commit into from

Conversation

fmigneault
Copy link
Collaborator

Overview

Jupyter instances that have been started by users are simply left in a running status indefinitely if not manually logged of.
This causes dockers jupyter-<user> to remain active, using resources unnecessarily.

This feature adds a new component that helps quickly detecting idle jupyter servers, and stopping them after a given inactivity timeout. The logic behind it depends on what JupyterHub reports on its activity API endpoint for the given user.

To have the utility available, the following requirements must be applied: Ouranosinc/jupyterhub#21
To test quickly, simply build the Docker on this branch and override JUPYTERHUB_DOCKER and JUPYTER_VERSION accordingly, and enable optional-components/jupyterhub-stop-idle in EXTRA_CONF_DIRS. Setting JUPYTERHUB_STOP_IDLE_TIMEOUT can be set to adjust the timeout interval.

Changes

Non-breaking changes

  • Adds optional-components/jupyterhub-stop-idle allowing culling of idle jupyter servers of users.

Breaking changes

  • n/a

Related Issue / Discussion

To Do

  • update changelog
  • update readme about new variables and how to enable the component

Still investigating

The timeout frequency is properly applied and docker logs show that activity status checks are performed, but servers are still considerd in a "running" status for some reason, even if nothing is running on them, nor any tab refreshing them being open.
I will look more into that, but I want to open this PR right away to gather early feedback.

@fmigneault fmigneault self-assigned this Oct 11, 2023
@github-actions github-actions bot added the component/jupyterhub Related to JupyterHub as development frontend with notebooks label Oct 11, 2023
@crim-jenkins-bot
Copy link
Collaborator

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2153/
Result : failure

BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-stop-idle
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-118.rdext.crim.ca

PAVICS-e2e-workflow-tests Pipeline Results

Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1365/

NOTEBOOK TEST RESULTS
    
[2023-10-11T23:48:51.688Z] ============================= test session starts ==============================
[2023-10-11T23:48:51.688Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-11T23:48:51.688Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2023-10-11T23:48:51.688Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-11T23:48:51.688Z] collected 254 items
[2023-10-11T23:48:51.688Z] 
[2023-10-11T23:48:57.001Z] notebooks-auth/geoserver.ipynb ..........F..FFF.                         [  6%]
[2023-10-11T23:49:05.077Z] notebooks-auth/test_thredds.ipynb ...........                            [ 11%]
[2023-10-11T23:49:13.673Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb .......        [ 13%]
[2023-10-11T23:49:22.719Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ......         [ 16%]
[2023-10-11T23:49:32.382Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F......       [ 19%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............        [ 24%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-11T23:57:26.374Z] ...............                                                          [ 30%]
[2023-10-11T23:57:35.410Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F..    [ 32%]
[2023-10-11T23:57:43.001Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb FFFFFF            [ 34%]
[2023-10-11T23:57:59.813Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ......           [ 37%]
[2023-10-11T23:58:01.721Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb .       [ 37%]
[2023-10-11T23:58:10.173Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb .......            [ 40%]
[2023-10-11T23:58:17.042Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb .....       [ 42%]
[2023-10-12T00:02:41.777Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-12T00:03:52.019Z] .............                                                            [ 53%]
[2023-10-12T00:03:57.276Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb ....             [ 54%]
[2023-10-12T00:04:00.878Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-12T00:04:19.351Z] .................                                                        [ 64%]
[2023-10-12T00:04:26.293Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ......          [ 66%]
[2023-10-12T00:04:28.206Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-12T00:04:29.244Z] .FFFFFFFF                                                                [ 70%]
[2023-10-12T00:04:39.744Z] finch-master/docs/source/notebooks/dap_subset.ipynb ...........          [ 75%]
[2023-10-12T00:04:49.084Z] finch-master/docs/source/notebooks/finch-usage.ipynb ......              [ 77%]
[2023-10-12T00:04:50.467Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-12T00:04:53.541Z] ......                                                                   [ 80%]
[2023-10-12T00:05:01.701Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-12T00:05:19.387Z] .............                                                            [ 85%]
[2023-10-12T00:05:29.380Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-12T00:06:08.407Z] ....s.                                                                   [ 88%]
[2023-10-12T00:06:18.401Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-12T00:06:32.529Z] ...                                                                      [ 90%]
[2023-10-12T00:06:44.750Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-12T00:07:07.332Z] ......                                                                   [ 92%]
[2023-10-12T00:07:24.773Z] notebooks/hummingbird.ipynb ............                                 [ 97%]
[2023-10-12T00:10:10.357Z] notebooks/stress-tests.ipynb ......                                      [100%]
[2023-10-12T00:10:10.357Z] 
[2023-10-12T00:10:10.357Z] =================================== FAILURES ===================================
    
  

Copy link
Collaborator

@mishaschwartz mishaschwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few documentation and clean-up comments below:

${JUPYTERHUB_STOP_IDLE_CONFIG}
"

#export DELAYED_EVAL="
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code please

# idle timeout of a user jupyterhub server to be shut down automatically (default: 1 day)
export JUPYTERHUB_STOP_IDLE_TIMEOUT=${JUPYTERHUB_STOP_IDLE_TIMEOUT:-86400}

# maximum age before stoping user servers (includiung active/running ones, regardless of last activity status)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maximum age in seconds? Pleas specify the unit of time.

Copy link
Collaborator

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing optional-components/README.rst and CHANGES.md update.

Also an existing idle culling mechanism was already possible via env.local, see

#export JUPYTERHUB_CONFIG_OVERRIDE="
#
# Sample below will shutdown idle server after 3 days and idle kernel after 1 day.
#
#c.Spawner.args.extend([
## Shut down the server after N seconds with no kernels or terminals running and no activity.
#'--NotebookApp.shutdown_no_activity_timeout={}'.format(3*24*60*60) , # 3 days
## Timeout (in seconds) after which a kernel is considered idle and ready to be culled.
#'--MappingKernelManager.cull_idle_timeout={}'.format(24*60*60), # 1 day
## Culling kernels which have one or more connections for idle but open notebooks.
## Otherwise, browser have to be closed for culling to work.
#'--MappingKernelManager.cull_connected=True',
#])

Not clear to me how better or different this external idle culling differ but I wonder if this is some changes between Jupyterhub v1 vs v4. I would say to keep this external idle culler in case the built-in idle culler do not work anymore in v4.

Please in the optional-components/README.rst, mention the existing internal idle culler documented in env.local.example, to give users an option if one of the 2 works better or just plain do not work anymore.

Has this newer culler been tested by lowering the timeout to very short amount?

@@ -187,4 +187,5 @@ blocked_users = {'authtest', '${CATALOG_USERNAME}', 'anonymous'}
c.Authenticator.blacklist = blocked_users # v0.9+
c.Authenticator.blocked_users = blocked_users # v1.2+

${JUPYTERHUB_CONFIG_EXTENDED} # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not re-use the existing JUPYTERHUB_CONFIG_OVERRIDE?

If you want to keep both, please leave 2 lines between the 2 variables so that when both are actually in use, they do not look mixed up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing that, but given that some instances already use export JUPYTERHUB_CONFIG_OVERRIDE=... in their env.local, which is sourced when runninig pavics-compose.sh, any overrides applied along the way get ignored.

Because of how JUPYTERHUB_CONFIG_EXTENDED gets defined using extra newlines, they are already separated by multiple extra lines. I could add more though, or even add comments in the template to separate them explicitly.

# Any component extending this configuration using this variable *MUST* include it back to allow further
# components to extend it as well. Also, they *MUST* extend the variable with each new definitions applied on
# a new line (without indents) to ensure that Python configurations will be executable without syntax error.
# This variable differs from 'JUPYTERHUB_CONFIG_OVERRIDE' that does not require users to include it back to make
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "include it back"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing something in the form:

export JUPYTERHUB_CONFIG_EXTENDED="
${JUPYTERHUB_CONFIG_EXTENDED}
<extra python here>
"

@tlvu
Copy link
Collaborator

tlvu commented Oct 12, 2023

The README of the external jupyterhub-idle-culler also mention about the JupyterHub can already cull idle jupyter itself without explaining why would someone need to use this external version https://github.com/jupyterhub/jupyterhub-idle-culler/blob/737dfa155b809453e6e795dd9de42d6a926fd4a0/README.md?plain=1#L263-L274

This is rather confusing to me.

@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?

@mishaschwartz
Copy link
Collaborator

@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?

Yeah, I'd rather keep one version or the other. It will definitely confuse users.

Can we test out if the old version still works and if not, remove the comment from env.local.
If it does still work then we should use that one instead unless there is some major advantage to the jupyterhub-idle-culler method. Is there @fmigneault ?

@fmigneault
Copy link
Collaborator Author

Can we test out if the old version still works and if not, remove the comment from env.local.

Yes. I would like to test existing approach, and if it works, only keep that one.
However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their env.local via JUPYTERHUB_CONFIG_OVERRIDE.
I did not even realize this code existed before being pointed out by @tlvu

@tlvu
Copy link
Collaborator

tlvu commented Oct 12, 2023

However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their env.local via JUPYTERHUB_CONFIG_OVERRIDE.

Yes, true, anything in production will need this to avoid long running idle jupyter containers wasting ram and cpu.

Then I would suggest moving that snippet of code into the default jupyterhub config and just make those timeout limits config variables.

I did not even realize this code existed before being pointed out by @tlvu

Being a production server with many users, we faced this issue #67 and although this culling might not fixed the root cause, it probably helped.

@fmigneault
Copy link
Collaborator Author

@tlvu @mishaschwartz
I found this thread that seems to explain properly the differences: https://discourse.jupyter.org/t/which-is-the-correct-way-to-cull-idle-kernels-and-notebook/8123/16

From my reading, it would seem the internal culler is sufficient for our use case.
However, I found in the comments that there is also a terminal culler that could keep the session/server active even if the kernel is idle, if only a terminal is keep open. (jupyter-server/jupyter_server#438)

So the config might need some extra parameters, but I would stick with only the internal parameters if those work by themselves.

fmigneault added a commit that referenced this pull request Oct 16, 2023
## Overview

Add new variables to easily configure idle jupyter user instances.

## Changes

**Non-breaking changes**
- Jupyterhub configurable idle server culling.
  - Add optional variables `JUPYTER_IDLE_SERVER_CULL_TIMEOUT`, `JUPYTER_IDLE_KERNEL_CULL_TIMEOUT` and
    `JUPYTER_IDLE_KERNEL_CULL_INTERVAL` that allows fined-grained configuration of user-kernel and server-wide
    docker image culling when their activity status reached a certain idle timeout threshold.
  - Enable idle kernel culling by default with a timeout of 1 day, and user server culling with timeout of 3 days.
  - Avoids the need for custom `JUPYTERHUB_CONFIG_OVERRIDE` specifically for idle server culling.
    If similar argument parameters should be defined using an older `JUPYTERHUB_CONFIG_OVERRIDE` definition,
    the new configuration strategy can be skipped by setting `JUPYTER_IDLE_KERNEL_CULL_TIMEOUT=0`.

**Breaking changes**
- n/a

## Related Issue / Discussion

- Closes #389 (replaces)
- Closes Ouranosinc/jupyterhub#21 (not required anymore)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/jupyterhub Related to JupyterHub as development frontend with notebooks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants