Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'safe' slug scheme #744

Merged
merged 22 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7bac57c
add 'safe' slug scheme
minrk Jun 8, 2023
d6be9f5
Merge branch 'main' into wip-slug
minrk Jun 12, 2024
bdf0f37
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 12, 2024
a1b524d
toward compatible transition for safe slugs
minrk Jun 12, 2024
c0b79af
Merge from main
minrk Jun 13, 2024
058afd5
revert use_legacy_labels config
minrk Jun 17, 2024
6968b68
let safe slug scheme live side-by-side with escape scheme
minrk Jun 17, 2024
45299ce
add multi_slug mechanism for multi-word slugs (username--servername--…
minrk Jun 17, 2024
e24b1f6
sub '-' for any sequence of unsafe characters
minrk Jun 26, 2024
cbd983b
restore trailing hyphen logic
minrk Jun 26, 2024
15ead3c
track kubespawner version in state, annotations
minrk Jun 26, 2024
fc4f628
allow opting out of persisted pvc name with remember_pvc_name = False
minrk Jun 26, 2024
fd4135f
document new template scheme and upgrade notes
minrk Jun 26, 2024
3ece569
update some test expectations
minrk Jun 26, 2024
f96ab38
exercise pvc_name upgrade cases
minrk Jun 26, 2024
09ade1c
Fix markdown table formatting
consideRatio Jul 20, 2024
10ba1fb
Document escaped_username and escaped_servername to be added in v7
consideRatio Jul 20, 2024
d4c2308
clearer comment about values being loaded by get_state
minrk Jul 30, 2024
f44b178
Merge branch 'main' into wip-slug
minrk Jul 30, 2024
97399f0
remove hardcoded safe slug scheme from namespace
minrk Jul 30, 2024
6232c4b
only handle legacy pvc name when remember_pvc_name is true
minrk Jul 31, 2024
845f3d8
Sync with main
minrk Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ management of containerized applications. If you want to run a JupyterHub
setup that needs to scale across multiple nodes (anything with over ~50
simultaneous users), Kubernetes is a wonderful way to do it. Features include:

- Easily and elasticly run anywhere between 2 and thousands of nodes with the
- Easily and elastically run anywhere between 2 and thousands of nodes with the
same set of powerful abstractions. Scale up and down as required by simply
adding or removing nodes.

Expand Down Expand Up @@ -81,5 +81,6 @@ utils
```{toctree}
:maxdepth: 2
:caption: Reference
templates
changelog
```
6 changes: 3 additions & 3 deletions docs/source/ssl.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ If enabled, the Kubespawner will mount the internal_ssl certificates as Kubernet

To enable, use the following settings:

```
```python
c.JupyterHub.internal_ssl = True

c.JupyterHub.spawner_class = 'kubespawner.KubeSpawner'
```

Further configuration can be specified with the following (listed with their default values):

```
c.KubeSpawner.secret_name_template = "jupyter-{username}{servername}"
```python
c.KubeSpawner.secret_name_template = "{pod_name}"

c.KubeSpawner.secret_mount_path = "/etc/jupyterhub/ssl/"
```
Expand Down
157 changes: 157 additions & 0 deletions docs/source/templates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
(templates)=

# Templated fields

Several fields in KubeSpawner can be resolved as string templates,
so each user server can get distinct values from the same configuration.

String templates use the Python formatting convention of `f"{fieldname}"`,
so for example the default `pod_name_template` of `"jupyter-{user_server}"` will produce:

| username | server name | pod name |
| ---------------- | ----------- | ---------------------------------------------- |
| `user` | `''` | `jupyter-user` |
| `user` | `server` | `jupyter-user--server` |
| `user@email.com` | `Some Name` | `jupyter-user-email-com--some-name---0c1fe94b` |

## templated properties

Some common templated fields:

- [pod_name_template](#KubeSpawner.pod_name_template)
- [pvc_name_template](#KubeSpawner.pvc_name_template)
- [working_dir](#KubeSpawner.working_dir)

## fields

The following fields are available in templates:

| field | description |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `{username}` | the username passed through the configured slug scheme |
| `{servername}` | the name of the server passed through the configured slug scheme (`''` for the user's default server) |
| `{user_server}` | the username and servername together as a single slug. This should be used most places for a unique string for a given user's server (new in kubespawner 7). |
| `{unescaped_username}` | the actual username without escaping (no guarantees about value, except as enforced by your Authenticator) |
| `{unescaped_servername}` | the actual server name without escaping (no guarantees about value) |
| `{pod_name}` | the resolved pod name, often a good choice if you need a starting point for other resources (new in kubespawner 7) |
| `{pvc_name}` | the resolved PVC name (new in kubespawner 7) |
| `{namespace}` | the kubernetes namespace of the server (new in kubespawner 7) |
| `{hubnamespace}` | the kubernetes namespace of the Hub |

Because there are two escaping schemes for `username`, `servername`, and `user_server`, you can explicitly select one or the other on a per-template-field basis with the prefix `safe_` or `escaped_`:

| field | description |
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `{escaped_username}` | the username passed through the old 'escape' slug scheme (new in kubespawner 7) |
| `{escaped_servername}` | the server name passed through the 'escape' slug scheme (new in kubespawner 7) |
| `{escaped_user_server}` | the username and servername together as a single slug, identical to `"{escaped_username}--{escaped_servername}".rstrip("-")` (new in kubespawner 7) |
| `{safe_username}` | the username passed through the 'safe' slug scheme (new in kubespawner 7) |
| `{safe_servername}` | the server name passed through the 'safe' slug scheme (new in kubespawner 7) |
| `{safe_user_server}` | the username and server name together as a 'safe' slug (new in kubespawner 7) |

These may be useful during a transition upgrading a deployment from an earlier version of kubespawner.

The value of the unprefixed `username`, etc. is goverend by the [](#KubeSpawner.slug_scheme) configuration, and always matches exactly one of these values.

## Template tips

In general, these guidelines should help you pick fields to use in your template strings:

- use `{user_server}` when a string should be unique _per server_ (e.g. pod name)
- use `{username}` when it should be unique per user, but shared across named servers (sometimes chosen for PVCs)
- use `{escaped_}` prefix if you need to keep certain values unchanged in a deployment upgrading from kubespawner \< 7
- `{pod_name}` can be re-used anywhere you want to create more resources associated with a given pod,
to avoid repeating yourself

## Changing template configuration

Changing configuration should not generally affect _running_ servers.
However, when changing a property that may need to persist across user server restarts, special consideration may be required.
For example, changing `pvc_name` or `working_dir` could result in disconnecting a user's server from data loaded in previous sessions.
This may be your intention or not! KubeSpawner cannot know.

`pvc_name` is handled specially, to avoid losing access to data.
If `KubeSpawner.remember_pvc_name` is True, once a server has started, a server's PVC name cannot be changed by configuration.
Any future launch will use the previous `pvc_name`, regardless of change in configuration.
If you _want_ to change the names of mounted PVCs, you can set

```python
c.KubeSpawner.remember_pvc_name = False
```

This handling isn't general for PVCs, only specifically the default `pvc_name`.
If you have defined your own volumes, you need to handle changes to these yourself.

## Upgrading from kubespawner \< 7

Prior to kubespawner 7, an escaping scheme was used that ensured values were _unique_,
but did not always ensure fields were _valid_.
In particular:

- start/end rules were not enforced
- length was not enforced

This meant that e.g. usernames that start with a capital letter or were very long could result in servers failing to start because the escaping scheme produced an invalid label.
To solve this, a new 'safe' scheme has been added in kubespawner 7 for computing template strings,
which aims to guarantee to always produce valid object names and labels.
The new scheme is the default in kubespawner 7.

You can select the scheme with:

```python
c.KubeSpawner.slug_scheme = "escape" # no changes from kubespawner 6
c.KubeSpawner.slug_scheme = "safe" # default for kubespawner 7
```

The new scheme has the following rules:

- the length of any _single_ template field is limited to 48 characters (the total length of the string is not enforced)
- the result will only contain lowercase ascii letters, numbers, and `-`
- it will always start and end with a letter or number
- if a name is 'safe', it is used unmodified
- if any escaping is required, a truncated safe subset of characters is used, followed by `---{hash}` where `{hash}` is a checksum of the original input string
- `-` shall not occur in sequences of more than one consecutive `-`, except where inserted by the escaping mechanism
- if no safe characters are present, 'x' is used for the 'safe' subset

Since length requirements are applied on a per-field basis, a new `{user_server}` field is added,
which computes a single valid slug following the above rules which is unique for a given user server.
The general form is:

```
{username}--{servername}---{hash}
```

where

- `--{servername}` is only present for non-empty server names
- `---{hash}` is only present if escaping is required for _either_ username or servername, and hashes the combination of user and server.

This `{user_server}` is the recommended value to use in pod names, etc.
In the escape scheme, `{user_server}` is identical to the previous value used in default templates: `{username}--{servername}`,
so it should be safe to upgrade previous templated using `{username}--{servername}` to `{user_server}` or `{escaped_user_server}`.

In the vast majority of cases (where no escaping is required), the 'safe' scheme produces identical results to the 'escape' scheme.
Probably the most common case where the two differ is in the presence of single `-` characters, which the `escape` scheme escaped to `-2d`, while the 'safe' scheme does not.

Examples:

| name | escape scheme | safe scheme |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `username` | `username` | `username` |
| `has-hyphen` | `has-2dhyphen` | `has-hyphen` |
| `Capital` | `-43apital` (error) | `capital---1a1cf792` |
| `user@email.com` | `user-40email-2ecom` | `user-email-com---0925f997` |
| `a-very-long-name-that-is-too-long-for-sixty-four-character-labels` | `a-2dvery-2dlong-2dname-2dthat-2dis-2dtoo-2dlong-2dfor-2dsixty-2dfour-2dcharacter-2dlabels` (error) | `a-very-long-name-that-is-too-long-for---29ac5fd2` |
| `ALLCAPS` | `-41-4c-4c-43-41-50-53` (error) | `allcaps---27c6794c` |

Most changed names won't have a practical effect.
However, to avoid `pvc_name` changing even though KubeSpawner 6 didn't persist it,
on first launch (for each server) after upgrade KubeSpawner checks if:

1. `pvc_name_template` produces a different result with `scheme='escape'`
1. a pvc with the old 'escaped' name exists

and if such a pvc exists, the older name is used instead of the new one (it is then remembered for subsequent launches, according to `remember_pvc_name`).
This is an attempt to respect the `remember_pvc_name` configuration, even though the old name is not technically recorded.
We can infer the old value, as long as configuration has not changed.
This will only work if upgrading KubeSpawer does not _also_ coincide with a change in the `pvc_name_template` configuration.
192 changes: 192 additions & 0 deletions kubespawner/slugs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
"""Tools for generating slugs like k8s object names and labels

Requirements:

- always valid for arbitary strings
- no collisions
"""

import hashlib
import re
import string

_alphanum = tuple(string.ascii_letters + string.digits)
_alphanum_lower = tuple(string.ascii_lowercase + string.digits)
_lower_plus_hyphen = _alphanum_lower + ('-',)

# patterns _do not_ need to cover length or start/end conditions,
# which are handled separately
_object_pattern = re.compile(r'^[a-z0-9\.-]+$')
_label_pattern = re.compile(r'^[a-z0-9\.-_]+$', flags=re.IGNORECASE)

# match anything that's not lowercase alphanumeric (will be stripped, replaced with '-')
_non_alphanum_pattern = re.compile(r'[^a-z0-9]+')

# length of hash suffix
_hash_length = 8


def _is_valid_general(
s, starts_with=None, ends_with=None, pattern=None, min_length=None, max_length=None
):
"""General is_valid check

Checks rules:
"""
if min_length and len(s) < min_length:
return False
if max_length and len(s) > max_length:
return False
if starts_with and not s.startswith(starts_with):
return False
if ends_with and not s.endswith(ends_with):
return False
if pattern and not pattern.match(s):
return False
return True


def is_valid_object_name(s):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function ended up not being used because instead of multiple is_valid args, is_valid_default is used, which validates the subset of object names and labels (same as object name with max length of 63 instead of 255)

"""is_valid check for object names"""
# object rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
return _is_valid_general(
s,
starts_with=_alphanum_lower,
ends_with=_alphanum_lower,
pattern=_object_pattern,
max_length=255,
min_length=1,
)


def is_valid_label(s):
"""is_valid check for label values"""
# label rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
if not s:
# empty strings are valid labels
return True
return _is_valid_general(
s,
starts_with=_alphanum,
ends_with=_alphanum,
pattern=_label_pattern,
max_length=63,
)


def is_valid_default(s):
"""Strict is_valid

Returns True if it's valid for _all_ our known uses

So we can more easily have a single is_valid check.

- object names have stricter character rules, but have longer max length
- labels have short max length, but allow uppercase
"""
return _is_valid_general(
s,
starts_with=_alphanum_lower,
ends_with=_alphanum_lower,
pattern=_object_pattern,
min_length=1,
max_length=63,
)


def _extract_safe_name(name, max_length):
"""Generate safe substring of a name

Guarantees:

- always starts and ends with a lowercase letter or number
- never more than one hyphen in a row (no '--')
- only contains lowercase letters, numbers, and hyphens
- length at least 1 ('x' if other rules strips down to empty string)
- max length not exceeded
"""
# compute safe slug from name (don't worry about collisions, hash handles that)
# cast to lowercase
# replace any sequence of non-alphanumeric characters with a single '-'
safe_name = _non_alphanum_pattern.sub("-", name.lower())
# truncate to max_length chars, strip '-' off ends
safe_name = safe_name.lstrip("-")[:max_length].rstrip("-")
if not safe_name:
# make sure it's non-empty
safe_name = 'x'
return safe_name


def strip_and_hash(name, max_length=32):
"""Generate an always-safe, unique string for any input

truncates name to max_length - len(hash_suffix) to fit in max_length
after adding hash suffix
"""
name_length = max_length - (_hash_length + 3)
if name_length < 1:
raise ValueError(f"Cannot make safe names shorter than {_hash_length + 4}")
# quick, short hash to avoid name collisions
name_hash = hashlib.sha256(name.encode("utf8")).hexdigest()[:_hash_length]
safe_name = _extract_safe_name(name, name_length)
# due to stripping of '-' in _extract_safe_name,
# the result will always have _exactly_ '---', never '--' nor '----'
# use '---' to avoid colliding with `{username}--{servername}` template join
return f"{safe_name}---{name_hash}"


def safe_slug(name, is_valid=is_valid_default, max_length=None):
"""Always generate a safe slug

is_valid should be a callable that returns True if a given string follows appropriate rules,
and False if it does not.

Given a string, if it's already valid, use it.
If it's not valid, follow a safe encoding scheme that ensures:

1. validity, and
2. no collisions
"""
if '--' in name:
# don't accept any names that could collide with the safe slug
return strip_and_hash(name, max_length=max_length or 32)
# allow max_length override for truncated sub-strings
if is_valid(name) and (max_length is None or len(name) <= max_length):
return name
else:
return strip_and_hash(name, max_length=max_length or 32)


def multi_slug(names, max_length=48):
"""multi-component slug with single hash on the end

same as strip_and_hash, but name components are joined with '--',
so it looks like:

{name1}--{name2}---{hash}

In order to avoid hash collisions on boundaries, use `\\xFF` as delimiter
"""
hasher = hashlib.sha256()
hasher.update(names[0].encode("utf8"))
for name in names[1:]:
# \xFF can't occur as a start byte in UTF8
# so use it as a word delimiter to make sure overlapping words don't collide
hasher.update(b"\xFF")
hasher.update(name.encode("utf8"))
hash = hasher.hexdigest()[:_hash_length]

name_slugs = []
available_chars = max_length - (_hash_length + 1)
# allocate equal space per name
# per_name accounts for '{name}--', so really two less
per_name = available_chars // len(names)
name_max_length = per_name - 2
if name_max_length < 2:
raise ValueError(f"Not enough characters for {len(names)} names: {max_length}")
for name in names:
name_slugs.append(_extract_safe_name(name, name_max_length))

# by joining names with '--', this cannot collide with single-hashed names,
# which can only contain '-' and the '---' hash delimiter once
return f"{'--'.join(name_slugs)}---{hash}"
Loading