Long namespace/ingress names cause collisions with auto-created resources on GKE #537

NickLavrov · 2018-11-05T23:56:43Z

I noticed this after creating one ingress that worked fine, then creating another ingress which caused the first ingress to behave incorrectly (default backend 404, etc).

As far as I can tell, at least these resources and annotations are added to the ingress by GKE:

ingress.kubernetes.io/forwarding-rule
ingress.kubernetes.io/target-proxy
ingress.kubernetes.io/url-map

If my namespace was called xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (32 chars) and my ingress was named xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, I'd see resources with names like:
k8s-um-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxx0

This is fine if there's only one ingress, but since my other ingress also started with the same set of characters, there would be a collision. I've looked at the resources/annotations it creates when using shorter names, and I notice that it means to append some random characters at the end. Here, it looks like I'm hitting the 64 character limit before it gets to that random string.

Maybe removing the 64 character limit or adding documentation about how these names are generated would help with this. It took me a while to look into this annotation as the root of my issues.

NOTE: This is the file that contains the naming logic https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go

The text was updated successfully, but these errors were encountered:

freehan · 2018-11-06T23:42:48Z

Thanks for the bug report. Will try to fix this problem. The key is to maintain backward compatibility.

fejta-bot · 2019-02-07T18:47:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bowei · 2019-02-07T23:36:45Z

/lifecycle frozen

freehan · 2019-02-27T18:42:21Z

Add toleration of long name to certain degree: #650

NickLavrov · 2019-08-12T21:49:45Z

Closing the issue since it was addressed in the PR above!

karlkfi · 2019-08-13T17:59:55Z

Changes in #650 do not actually solve this issue completely.

It is still possible for GCLB name and backend name (Network Endpoint Group) to collide if the namespace name + ingress name are too long, especially between two clusters with similar names, because the cluster name is at the end and gets truncated first.

LB: https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go#L340
NEG: https://github.com/kubernetes/ingress-gce/blob/master/pkg/utils/namer.go#L430

The UID also commonly gets truncated, which increases the chance of UID collisions.

I like that the cluster/namespace/ingress names are in these GCLB names, because it makes them easier to identify when debugging, but the whole name-generation scheme needs to be redesigned to better guarantee uniqueness.

freehan · 2019-08-13T22:01:28Z

Let me summarize the current naming problem per GCE Loadbalancer resource:

LB frontend resources: FowardingRule, TargetProxy, UrlMap. These resources have the following naming scheme.

k8s-{resource prefix}-{namespace}-{name}-{cluster_uid}

{resource prefix} is the short name of the resource. e.g. "um" is short for UrlMap
{namespace} is the namespace of ingress resource
{name} is the name of ingress resource

The name will be truncated to 63 characters (max length of GCE resource name). When the namespace and name long (55 chars). The GCE resource name will lose uniqueness across multiple ingresses within the same cluster or across multiple clusters. (e.g. if the namespace length is 63 characters, then everything after 63 characters will be truncated. If there is another ingress with the same namespace prefix in the same cluster or another cluster. There will be GCE resource naming collision) In addition, the current GC logic will leak resources with long name (because cluster uid is truncated, hence GC does not recognize the resource as managed by itself)

Cluster UID.
The current cluster uid is a 16 character generated hash. The cluster uid is currently stored in a configmap under kube-system. Since it is stored in config map, it is vulnerable to modifications.
NEG and BackendService in NEG mode. The naming scheme is as follows:

k8s1-{short cluster uid}-{service namespace}-{service name}-{service port}-{hash}

{short cluster uid} is the 8 char prefix of the cluster uid

Naming collision may happen if short cluster uid, namespace, name and port are all the same across clusters. This is mostly due to the problem #2 when user modified the cluster uid and make the short cluster uid (8 character prefix of the cluster uid) the same.

Firewalls, Instance Group and BackendService in IG mode
These resources does not suffer the naming collision problem. This is mostly because they are short enough to avoid truncation. Their naming scheme are as follows:

Firewall

k8s-fw-l7-{cluster-uid}

Instance Group

k8s-ig-{cluster-uid}

Backend Service:

k8s-be-{node port}-{cluster-uid}

Requirements for fixes

Uniqueness
Resource naming scheme should guarantee uniqueness across clusters.
Backward compatibility
A. Keep LB in current state intact. Since there is no way to perform hitless migration for LBs, the controller has to keep supporting LBs configured using old naming scheme.
B. GC need to handle all naming schemes to avoid resource leak.
C. Test dimension multiplication. Existing e2e tests will have to run across different naming schemes as well as controller upgrades.

Standing plan

Fixing problem #1:
Introduce new naming scheme to guarantee uniqueness while keeping existing LBs in tact. Fix GC for both cases.

Fixing problem #2 & #3 :
Fixing #2 can fix #3 along the way. The current plan is to use uid of kube-system namespace to replace the cluster uid config map. This will avoid user modification and collision. At the same time keep supporting clusters running in the old mode (cluster uid in configmap).

karlkfi · 2019-11-16T18:13:47Z

So @skmatti,
With the new naming pattern, what's the truncation strategy and how short do namespace and ingress names need to be to avoid overlap?

k8s{version-id}-{resource-prefix}-{kube-system-uid}-{namespace}-{name}-{8-char-hash}

skmatti · 2019-11-16T19:02:54Z

There is no restriction on lengths of namespace and name. Namespace and Name are trimmed evenly until their combined length is 36 in case of overflows.

karlkfi · 2019-11-16T19:08:10Z

The 8-char-hash is a hash of the full un-trimmed name, right?
Otherwise we still have a possible collision problem.

skmatti · 2019-11-16T19:08:49Z

Yes, that's right.

bowei · 2019-11-20T16:46:21Z

The trimmed parts of the name are for human consumption only. The hash should be always unique (modulo actual hash collisions, which should be a low probability event).

karlkfi · 2019-11-20T17:08:15Z

Perfect. That’s what we needed.

freehan added the kind/bug Categorizes issue or PR as related to a bug. label Nov 6, 2018

freehan closed this as completed Nov 6, 2018

freehan reopened this Nov 6, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2019

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2019

freehan mentioned this issue Feb 27, 2019

Modify NameBelongToCluter to tolerate truncated cluster name suffix #650

Merged

NickLavrov closed this as completed Aug 12, 2019

freehan reopened this Aug 13, 2019

skmatti mentioned this issue Oct 15, 2019

Add V2 frontend namer #892

Merged

k8s-ci-robot closed this as completed in #892 Nov 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

NickLavrov commented Nov 5, 2018 •

edited

Loading

freehan commented Nov 6, 2018 •

edited

Loading

fejta-bot commented Feb 7, 2019

bowei commented Feb 7, 2019

freehan commented Feb 27, 2019 •

edited

Loading

NickLavrov commented Aug 12, 2019

karlkfi commented Aug 13, 2019 •

edited

Loading

freehan commented Aug 13, 2019 •

edited

Loading

karlkfi commented Nov 16, 2019

skmatti commented Nov 16, 2019

karlkfi commented Nov 16, 2019

skmatti commented Nov 16, 2019

bowei commented Nov 20, 2019

karlkfi commented Nov 20, 2019

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

Long namespace/ingress names cause collisions with auto-created resources on GKE #537

Comments

NickLavrov commented Nov 5, 2018 • edited Loading

freehan commented Nov 6, 2018 • edited Loading

fejta-bot commented Feb 7, 2019

bowei commented Feb 7, 2019

freehan commented Feb 27, 2019 • edited Loading

NickLavrov commented Aug 12, 2019

karlkfi commented Aug 13, 2019 • edited Loading

freehan commented Aug 13, 2019 • edited Loading

Requirements for fixes

Standing plan

karlkfi commented Nov 16, 2019

skmatti commented Nov 16, 2019

karlkfi commented Nov 16, 2019

skmatti commented Nov 16, 2019

bowei commented Nov 20, 2019

karlkfi commented Nov 20, 2019

NickLavrov commented Nov 5, 2018 •

edited

Loading

freehan commented Nov 6, 2018 •

edited

Loading

freehan commented Feb 27, 2019 •

edited

Loading

karlkfi commented Aug 13, 2019 •

edited

Loading

freehan commented Aug 13, 2019 •

edited

Loading