Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'safe' slug scheme #744

Merged
merged 22 commits into from
Aug 1, 2024
Merged

add 'safe' slug scheme #744

merged 22 commits into from
Aug 1, 2024

Conversation

minrk
Copy link
Member

@minrk minrk commented Jun 8, 2023

closes #737
closes #498

implements scheme described here.

  • always produces valid values
  • uses names, etc. as-is, when valid (no more -2d)
  • uses stripped name---hash when needed
  • new behavior is enabled by default, opt-out for backward compatibility via KubeSpawner.slug_scheme = 'escape'

Potential issues to address:

  • escaping is applied to the final template result, rather than individual strings. This ensures correct values, but means the hash is a function of the full template string, not just the username (or servername). Dealing with length is tricky if we need to hash bits at a time.
  • Not everything is a kubernetes field, e.g. working directory should probably get special handling? It has different rules than the label fields
  • a breaking change for users to change schemes (just like changing the template config), but at least we don't make the change for them

- always produces valid values
- uses names, etc. as-is, when valid
- uses stripped name--hash when needed
return True


def is_valid_object_name(s):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function ended up not being used because instead of multiple is_valid args, is_valid_default is used, which validates the subset of object names and labels (same as object name with max length of 63 instead of 255)

@Ph0tonic
Copy link

Hi,
Any status on this PR ? It would be great to have this merged, can I help with it ?
Thanks

Copy link

@Ph0tonic Ph0tonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions an questions.

kubespawner/spawner.py Outdated Show resolved Hide resolved
kubespawner/spawner.py Outdated Show resolved Hide resolved
kubespawner/spawner.py Outdated Show resolved Hide resolved
kubespawner/spawner.py Show resolved Hide resolved
@Ph0tonic
Copy link

Ph0tonic commented Apr 9, 2024

Hi,
Any status update on this PR ? May I help with anything ?
\cc @minrk

@consideRatio
Copy link
Member

consideRatio commented May 4, 2024

@minrk I looked at this in order to help nudge it towards merge, but I see that its a very breaking PR to enable this for an existing hub as for example pvc_name is gets changed.

        if self.enable_user_namespaces:
            self.namespace = self._expand_user_properties(self.user_namespace_template)
        self.pod_name = self._expand_user_properties(self.pod_name_template)
        self.dns_name = self.dns_name_template.format(namespace=self.namespace, name=self.pod_name)
        self.secret_name = self._expand_user_properties(self.secret_name_template)
        self.pvc_name = self._expand_user_properties(self.pvc_name_template)
        if self.working_dir:
            self.working_dir = self._expand_user_properties(self.working_dir)

The changes to the labels are far less breaking as it won't affect kubespawner/z2jh/binderhub/grafana-dashboards etc - and only be breaking for external software considering the labels.


I'm not landing in a concrete action point here, just summarizing this realization.

@minrk
Copy link
Member Author

minrk commented May 6, 2024

@consideRatio yeah, I think that's exactly why I didn't finish this. I think the new scheme is fine and works better, but figuring out the transition is the hard part.

@minrk
Copy link
Member Author

minrk commented Jun 12, 2024

I'm going to try to pick this one up with #835 / #836 in mind as the transition path:

  • leave old labels as they are by default
  • make old labels opt-out
  • use safe scheme unconditionally in new labels
  • use new labels in all selectors, don't rely on old labels for anything

The hard part is still going to be transitioning things like PVC, pod name, etc. We need to make sure that once established, they don't change even if the scheme changes. I think this is true already for pod name. Persisting chosen values in Spawner state should do this for us, but will need to think it through and make sure everything is where it's supposed to be, and everything is persisted that should be. Looking at it, unfortunately pvc name is not persisted, so that's going to be hard.

We may have to keep both schemes alive and check at runtime:

if new name exists:
   use new name
else:
  if old name exists:
    use old name, warn
  else:
    use new name

It may turn out that the best we can do is a breaking change with a documented manual migration step.

PVCs can be renamed through the somewhat frightening process of:

  • get pv id from pvc
  • set volume retention to Retain
  • delete old pvc
  • patch claimRef on volume
  • create new pvc with specified volumeName

or use krew rename-pvc (which does the same thing, but has the benefit of being written by not us).

- safe scheme by default
- preserve old scheme if found
- persist pvc_name and attempt to handle lack of persistence from before
- add handle_legacy_names and use_legacy_labels configuration for legacy labels
@yuvipanda
Copy link
Collaborator

Thank you for picking this back up, @minrk!

@consideRatio
Copy link
Member

Thank you for working this @minrk!!

These points may already be known and considered, but i wanted to highlight it when considering the paths onwards just in case.

A external resource complexity to be aware of

A key complexity for any change to username expansion is external resources, such as just creating a volume mount on the pod with subPath including username. We can't know if the external nfs volume includes the old username expansion or not, so we can't determine if its ok to switch over or not etc.

Kubespawner itself only use component label i think

I think the only use of labels currently by kubespawner itself is to reduce the set of resources to look at by the reflector using the component label. A performance aspect, because then the actual resource names are looked up among that set.

@minrk
Copy link
Member Author

minrk commented Jun 13, 2024

Thanks for the notes!

I think the only use of labels currently by kubespawner itself is to reduce the set of resources to look at by the reflector using the component label.

Close. I think there's one more place: when services are enabled, labels are also used to route service traffic to the pod, which also uses the full label set (I'm not sure that's the right choice, but it should at least include user and server names).

Now that this mostly works, I think there are some problems in the switch to handling escape of whole strings at a time:

  • variable expansion is done all over the place, with wildly different rules:
    • pod/pvc_name use object naming rules
    • labels have different rules (it's fine to use a strict subset of objects and labels for both)
    • subdir mount paths (totally different)
    • pod_connect_ip where hostname rules apply, the pod name may want to be used, but the rest of the string definitely must not be escaped
  • some strings should include other strings, but whole-string escaping makes that impossible unless all expanded results can also be referenced by name:
    • pod_name_template "jupyter-{username}--{servername}" -> jupyter-escape--server--abc123
    • pod_connect_ip "jupyter-{username}--{servername}.svc.local" -> jupyter-escape--serversvclocal--def456 which should be jupyter-escape--server--abc123.svc.local

That said, we cannot guarantee correct strings if we don't handle the full slug. I think it's simply not feasible to have one expand_user_properties that's used all over the place for pod names, URLs, templates, paths and have it produce correct, consistent results.

I don't really know how to address this. I think it only makes sense to apply the 'safe' slug scheme to kubernetes resource names and labels, and not in other contexts. It usually doesn't make sense to apply the escaping to mounts or subdirectories, but sometimes it will. The rules, however, are completely different and not really knowable. Applying the new scheme only to resource names

I'm going to give this a read through, and instead of changing what expand_user_properties does globally, I'm going to try to:

  • only apply the new scheme to k8s resource names and labels (drastically reduce potential impact of the change)
  • make pod_name (or perhaps dedicated {k8s_name}) available in the template namespace, so it can be referenced by other template strings without modification

@yuvipanda
Copy link
Collaborator

Thank you for diligently working through this hard problem, @minrk.

Let's bump a major version whenever we release this.

no need to make legacy component labels optional
@minrk
Copy link
Member Author

minrk commented Jun 17, 2024

Getting closer, but still tricky stuff to think about:

  • object names should be safe by default
  • it should be possible to force the old scheme for consistency (either a single global flag, or new template fields that keep the old value)
  • it should be possible to reference created resources where object rules may not apply (add pod_name, pvc_name, etc. to template namespace)
  • we should not lose track of pvcs across upgrade (pvc_name should have been in get_state all along)
  • applying appropriate context-dependent rules in blackboxes like in _expand_all is not feasible, so per-field escaping is really the only way to go

The {username}--{servername} collision problem changes slightly depending on how/when we apply the escape because e.g. username="üser", servername="" produces 'ser--73506260' which can also be produced with username="ser", servername="73506260", so we would need to pick a different delimiter and/or hash joiner to avoid that.

minrk added 2 commits June 17, 2024 11:17
- add user_server to template namespace
- add `safe_` prefixed fields to template namespace using new 'safe' slug scheme
- add pod_name, namespace, pvc_name to template namespace
@minrk
Copy link
Member Author

minrk commented Jun 18, 2024

I think I'm narrowing it down to a scheme we can safely and comprehensibly switch to. I mainly need to hash out the transition process and write out some tests and docs for how that should work. In all cases, current behavior can be preserved unchanged explicitly by replacing {username} with {escaped_username} in templates, which will continue to be the old-style value with no plans to change. There is a new {safe_username} that is escaped with the new scheme, and a new {user_server} field that is the join of {username}--{servername} and can be used anywhere a unique slug for a given user+server should be used.

In general, we have:

  • username, servername, user_server - value determined by slug_scheme. Currently defaults to the new safe. Always match exactly one of:
  • ecaped_ variants - current scheme with escapism. Not guaranteed to produce valid values, especially for labels. {escaped_user_server} is identical to {escaped_username}--{escaped_servername} to preserve backward compatibility
  • safe_ variants - new, safe scheme that always produces a safe label. safe_user_server is not always identical to {safe_username}--{safe_servername}, but it is in the situations where neither needs any escaping, which is most of the time.
  • pod_name, pvc_name, and namespace are available to reference in templates (I'm mostly thinking of extra_ resources that may refer to them).

I've also removed the config to make the labels in #835 optional, which I don't think we need as covered by @yuvipanda's comments over there.

I feel like I am getting a good handle on resource names (pod_name, pvc_name), but figuring out the right thing to do for work_dir is a good deal more complex. Part of the trouble is that resource names are a safer transition because we can check if a resource at the old name exists and use it. We can also use state persistence to keep things sticky across config changes. We can't check if a filesystem path exists in time to make a choice.

Filesystems also tend to have very different, much more permissive, rules than k8s object names, but they can vary quite a bit! So what should we do for work_dir?

  • use new scheme and document how to keep old scheme with template config
  • continue to use old scheme by default, because it's much less likely to have caused any problems for paths
  • somehow try to identify if old scheme has been used, and use new if not

@manics
Copy link
Member

manics commented Jun 18, 2024

If it helps we can potentially force admins to make a deliberate decision when upgrading Z2JH. If we add a new Z2JH config value to the schema and deliberately set it to an invalid value in values.yaml an admin must set the config. It's inconvenient for anyone expecting a seamless upgrade, but the schema validation should happen before anything is deployed so it shouldn't break anything.

- upgrade from 6.x after default slug scheme change preserves pvc attachment
- old name check doesn't happen if loaded state from more kubespawner 7
@minrk minrk changed the title [WIP] add 'safe' slug scheme add 'safe' slug scheme Jun 26, 2024
@minrk minrk marked this pull request as ready for review June 26, 2024 12:51
@minrk
Copy link
Member Author

minrk commented Jun 26, 2024

I wouldn't call this "done" by any means, but think it's now at least "complete" in that I have a pretty good understanding of what's changed, what works on upgrade and what requires user intervention, it's documented more thoroughly than before, and the new upgrade logic is tested and explained.

So I think this is ready for review!

@consideRatio
Copy link
Member

consideRatio commented Jul 20, 2024

Amazing effort into this @minrk!!! I'm looking to help drive review effort and one of the key things I knew I wanted to look into was how this affected providing user storage via NFS, so this comment got dedicated to that as a first step.

NFS storage considerations

When using NFS storage, one may have a single PVC configured with a subPath referencing "{username}".

Click to see example z2jh / kubespawner config

This is z2jh Helm chart config:

jupyterhub:
  singleuser:
    storage:
      type: static
      static:
        pvcName: home-nfs
        subPath: "{username}"

Via a z2jh provided jupyterhub_config.py, this leads to KubeSpawner config:

c.KubeSpawner.volumes = [
    {
        "name": "home",
        "persistentVolumeClaim": {
            "claimName": "home-nfs",
        },
    },
],
c.KubeSpawner.volume_mounts = [
    {
        "mountPath": "/home/jovyan",
        "name": "home",
        "subPath": "{username}",  # this is whats important
    },
]

In such situations, {username} have been the same as {escaped_username}, but will transition to being {safe_username} KubeSpawner v7. This will lead to mounting a new folder for usernames with characters including for example: ., -, @, or capital letters.

A quick but essential config change before upgrading to z2jh v4 / kubespawner v7 would be to do a change like this:

jupyterhub:
  singleuser:
    storage:
      type: static
      static:
        pvcName: home-nfs
-        "subPath": "{username}"
+        "subPath": "{escaped_username}"

For new hubs though, it would make sense to do "subPath": "{username}". The z2jh default of singleuser.storage.static.subPath is "{username}", so for z2jh this is a key breaking change to point out.

Action points

  • z2jh v4 including kubespawner v7 need to highlight use of singleuser.storage.type=static need to adjust subPath to "{escaped_username}" for existing deployments to avoid re-creating users' home directories

@consideRatio
Copy link
Member

consideRatio commented Jul 20, 2024

Migration consideration notes

In case one wants to migrate some resource, for example NFS storage folder names, one may need to identify how {username} has changed for a set of users before and after. This comment explores some steps one could need to take in such migration.

Expand to read more

Detecting if {username} changed between v6 and v7 for specific usernames

import string

from escapism import escape
from kubespawner.slugs import safe_slug

test_username = "some-unescaped-username"

escaped_safe_chars = set(string.ascii_lowercase + string.digits)
escaped_username = escapism.escape(test_username, safe=escaped_safe_chars, escape_char='-').lower()
safe_username = safe_slug(test_username)
if safe_username != escaped_username:
    print(f"safe_username != escaped_username: {safe_username} != {escaped_username}")

Getting all usernames via JupyterHub API

import requests

def iter_usernames(hub_url, token):
    """generator yielding all usernames for a given hub

    requires a token with `list:users` scope.
    """
    next_url = hub_url + "/hub/api/users"
    s = requests.Session()
    s.headers["Authorization"] = f"Bearer {token}"
    s.headers["Accept"] = "application/jupyterhub-pagination+json"
    while next_url:
        print(f"Fetching {next_url}")
        r = s.get(next_url)
        r.raise_for_status()
        page = r.json()
        for user in page["items"]:
            yield user["name"]
        if page["_pagination"]["next"]:
            next_url = page["_pagination"]["next"]["url"]
        else:
            next_url = None

Creating a map of old -> new {username} expansion

TODO

Comment on lines 1386 to 1395
handle_legacy_names = Bool(
True,
config=True,
help="""handle legacy names and labels

kubespawner 7 changed the scheme for computing names and labels to be more reliably valid.
In order to preserve backward compatibility, the old names must be handled in some places.

You can safely disable this if no PVCs were created or running servers were started
before upgrading to kubespawner 7.
Copy link
Member

@consideRatio consideRatio Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config handle_legacy_names is only used to handle a transition to safe name expansions for PVCs, not for anything else like the leading help text indicates with handle legacy names and labels.

I think updating this to describe its just about managing the PVC naming makes sense rather than trying to handle anything beyond the PVC naming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intended to have this for a catch-all config "check if the upgraded defaults might have detached something without user input," but PVCs were the only thing I found where there was something to be done. Maybe namespace should be handled as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespaces don't need to be handled because they were persisted in 6.x, so I think this is okay as-is.

@consideRatio
Copy link
Member

consideRatio commented Jul 31, 2024

I chatted with @minrk about this just now, here is some planning made explicit:

  1. @minrk does a final pass on this PR, possibly with no changes
  2. We go for a kubespawner 7 beta release
  3. We bump to kubespawner 7 beta in z2jh to get a dev release out
  4. I'll do some test upgrades to verify things seem ok
    • test the proposed migration in add 'safe' slug scheme #744 (comment)
    • test a user with username something.something
    • test a user with a pvc created already
      EDIT: I've not done this, and I'm giving up trying to get it done - but users have tested this it seems via z2jh beta with a kubespawner beta, and it seems OK! There were an issue for a user that had some custom logic in a hook, but besides that it no users have reported issues about this.
    • test a user with a pvc not created already
      EDIT: Same note as the PVC test above.

Notes fot he future:

since it is an attempt to 'remember' an old name from kubespawner 6 which never remembers
@minrk
Copy link
Member Author

minrk commented Jul 31, 2024

I determined that handle_legacy_names should indeed be subordinate to remember_pvc_name, since it is itself an extended implementation of remembering the previously used pvc_name. Both are true by default.

@minrk
Copy link
Member Author

minrk commented Aug 1, 2024

Naturally, tests have started to fail for unrelated reasons on the last commit. Looking into it.

@minrk
Copy link
Member Author

minrk commented Aug 1, 2024

Fixed tests with #846

@consideRatio consideRatio merged commit f77cfc3 into jupyterhub:main Aug 1, 2024
9 checks passed
@consideRatio
Copy link
Member

Thank you for amazing effort into this @minrk!!

I'm on vacation for two weeks now, but hope to find time to contribute towards release of beta and testing anyhow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants