HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255

ledroide · 2020-11-05T15:53:07Z

Hello,
The manager container from hnc-controller-manager deployment show continuously lots of logs like this :

2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:25105: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:44592: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:49653: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:36010: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:6684: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:45771: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:37314: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:8601: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:15705: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:50125: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:11056: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:59676: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:17264: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:24978: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:28812: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:15634: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:7157: remote error: tls: bad certificate
2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:41785: remote error: tls: bad certificate

Does it show an actual issue ? If not, how can we disable the handshakes attempts, or do not log these attempts ?

adrianludwin · 2020-11-05T18:20:08Z

Are you able to modify any HNC workloads? Did you modify the YAML files in any way? I think we can expect errors like this for a short period of time when HNC is *first* installed, before the certificates have been created and distributed. But they should stop within the first 30s and should never recur.

…

On Thu, Nov 5, 2020 at 10:53 AM Serge Hartmann ***@***.***> wrote: Hello, The *manager* container from *hnc-controller-manager* deployment show continuously lots of logs like this : 2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:25105: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:44592: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:49653: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:36010: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:6684: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:45771: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:37314: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:8601: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:15705: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:50125: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:11056: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:59676: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:17264: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:24978: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:28812: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:15634: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:7157: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:41785: remote error: tls: bad certificate Does it show an actual issue ? If not, how can we disable the handshakes attempts, or do not log these attempts ? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1255>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZAU34VEHYWG2KOEZ7LSOLC7NANCNFSM4TLRDRZA> .

ledroide · 2020-11-06T14:16:24Z

I did not modify the hnc-manager.yaml all-in-one file
container image name is gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0
this log is still displayed continuously after 22h running

$ kubectl get deploy/hnc-controller-manager -o wide -n hnc-system
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS                IMAGES                                                                                         SELECTOR
hnc-controller-manager   1/1     1            1           22h   manager,kube-rbac-proxy   gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0,gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0   control-plane=controller-manager

$ kubectl logs --tail 12 deploy/hnc-controller-manager -c manager -n hnc-system
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:10357: remote error: tls: bad certificate
{"level":"info","ts":1604671335.157892,"logger":"cert-rotation","msg":"CRD subnamespaceanchors.hnc.x-k8s.io is being deleted"}
{"level":"info","ts":1604671335.254194,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"}
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:38357: remote error: tls: bad certificate
{"level":"info","ts":1604671335.2594817,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"}
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:40448: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:25077: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:18810: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:56772: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:52681: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:50827: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:29824: remote error: tls: bad certificate

I cannot define a parent to a namespace

$ kubectl hns --version
kubectl-hns version v0.6.0
$ kubectl hns tree webs
Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
$ kubectl get ns sbr01
NAME    STATUS   AGE
sbr01   Active   14d
$ kubectl hns set sbr01 --parent webs
Error reading hierarchy for sbr01: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
$ kubectl get hncconfiguration 
Error from server: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HNCConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority

Maybe the errors above are not the same issue, but anyway it does not work and messages talk about certificates.

Serge

adrianludwin · 2020-11-06T14:19:21Z

Ahh I think you need to upgrade to the new version of kubectl-hns - those logs show a problem with hnc.x-k8s.io/v1alpha1, which was the API version used by 0.5. But 0.6 uses v1alpha2 and requires the latest kubectl-hns. See if that helps?

…

On Fri, Nov 6, 2020 at 9:16 AM Serge Hartmann ***@***.***> wrote: - I did not modify the hnc-manager.yaml all-in-one file - container image name is gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0 - this log is still displayed continuously after 22h running $ kubectl get deploy/hnc-controller-manager -o wide -n hnc-system NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR hnc-controller-manager 1/1 1 1 22h manager,kube-rbac-proxy gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0,gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0 control-plane=controller-manager $ kubectl logs --tail 12 deploy/hnc-controller-manager -c manager -n hnc-system 2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:10357: remote error: tls: bad certificate {"level":"info","ts":1604671335.157892,"logger":"cert-rotation","msg":"CRD subnamespaceanchors.hnc.x-k8s.io is being deleted"} {"level":"info","ts":1604671335.254194,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"} 2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:38357: remote error: tls: bad certificate {"level":"info","ts":1604671335.2594817,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"} 2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:40448: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:25077: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:18810: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:56772: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:52681: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:50827: remote error: tls: bad certificate 2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:29824: remote error: tls: bad certificate I cannot define a parent to a namespace $ kubectl hns --version kubectl-hns version v0.6.0 $ kubectl hns tree webs Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority $ kubectl get ns sbr01 NAME STATUS AGE sbr01 Active 14d $ kubectl hns set sbr01 --parent webs Error reading hierarchy for sbr01: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority $ kubectl get hncconfiguration Error from server: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HNCConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority Maybe the errors above are not the same issue, but anyway it does not work and messages talk about certificates. Serge — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1255 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZFNL2X6SRFNPBOVXBDSOQAMTANCNFSM4TLRDRZA> .

ledroide · 2020-11-06T14:30:44Z

That's what I suspected at first, but if you read my output, there is the check :

$ kubectl hns --version
kubectl-hns version v0.6.0

I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages about remaining CRDs), some I have delete all resources from v0.5.0 manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from the new all-in-one manifest.

Here are the API references and resources :

$ kubectl api-resources | grep -i hnc
hierarchyconfigurations                                 hnc.x-k8s.io                   true         HierarchyConfiguration
hncconfigurations                                       hnc.x-k8s.io                   false        HNCConfiguration
subnamespaceanchors               subns                 hnc.x-k8s.io                   true         SubnamespaceAnchor

$ kubectl api-versions | grep -i hnc
hnc.x-k8s.io/v1alpha2

$ kubectl get crd -o wide | grep hnc
hierarchyconfigurations.hnc.x-k8s.io             2020-10-01T09:23:18Z
hncconfigurations.hnc.x-k8s.io                   2020-10-01T09:23:18Z
subnamespaceanchors.hnc.x-k8s.io                 2020-10-01T09:23:18Z

adrianludwin · 2020-11-06T18:54:41Z

I'm not sure what you mean by "references from the old API?" All those outputs are valid for the new API as well - nothing there mentions v1alpha1. Just to confirm - did you delete your validating webhook config? kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io hnc-validating-webhook-configuration And then try recreating that? And once you do, try restarting the HNC pod as well (k delete pods --all -n hnc-system). Failing that... what version/distribution of K8s are you using?

…

On Fri, Nov 6, 2020 at 9:30 AM Serge Hartmann ***@***.***> wrote: That's what I suspected at first, but if you read my output, there is the check : $ kubectl hns --version kubectl-hns version v0.6.0 I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages about remaining CRDs), some I have delete all resources from v0.5.0 manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from the new all-in-one manifest. However I still got references from the old API : $ kubectl api-resources | grep -i hnc hierarchyconfigurations hnc.x-k8s.io true HierarchyConfiguration hncconfigurations hnc.x-k8s.io false HNCConfiguration subnamespaceanchors subns hnc.x-k8s.io true SubnamespaceAnchor $ kubectl api-versions | grep -i hnchnc.x-k8s.io/v1alpha2 $ kubectl get crd -o wide | grep hnchierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zhncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zsubnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Z — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1255 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZHZ36VAYA6H4M3E4BTSOQCCHANCNFSM4TLRDRZA> .

adrianludwin · 2020-11-06T18:55:51Z

Also can you give any more insight about the problem you had upgrading? What were the "messages about remaining CRDs? /cc @yiqigao217

…

On Fri, Nov 6, 2020 at 1:54 PM Adrian Ludwin ***@***.***> wrote: I'm not sure what you mean by "references from the old API?" All those outputs are valid for the new API as well - nothing there mentions v1alpha1. Just to confirm - did you delete your validating webhook config? kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io hnc-validating-webhook-configuration And then try recreating that? And once you do, try restarting the HNC pod as well (k delete pods --all -n hnc-system). Failing that... what version/distribution of K8s are you using? On Fri, Nov 6, 2020 at 9:30 AM Serge Hartmann ***@***.***> wrote: > That's what I suspected at first, but if you read my output, there is the > check : > > $ kubectl hns --version > kubectl-hns version v0.6.0 > > I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages > about remaining CRDs), some I have delete all resources from v0.5.0 > manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from > the new all-in-one manifest. > > However I still got references from the old API : > > $ kubectl api-resources | grep -i hnc > hierarchyconfigurations hnc.x-k8s.io true HierarchyConfiguration > hncconfigurations hnc.x-k8s.io false HNCConfiguration > subnamespaceanchors subns hnc.x-k8s.io true SubnamespaceAnchor > > $ kubectl api-versions | grep -i hnchnc.x-k8s.io/v1alpha2 > > $ kubectl get crd -o wide | grep hnchierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zhncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zsubnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Z > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1255 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AE43PZHZ36VAYA6H4M3E4BTSOQCCHANCNFSM4TLRDRZA> > . >

yiqigao217 · 2020-11-09T14:17:25Z

It looks like you were upgrading when the certs were not there, so the conversion webhooks cannot work either. Before your upgrade, did the validating webhooks work for your in v0.5?

ledroide · 2020-11-12T16:18:02Z

@yiqigao217 : yes, validation webhooks worked with hnc v0.5.0

@adrianludwin : I have deleted validatingwebhookconfigurations.admissionregistration.k8s.io/hnc-validating-webhook-configuration and re-created the hnc-system with v0.6.0.

Here is the situation now :

$ kubectl hns tree webs
Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority

$ kubectl create namespace level01
$ kubectl create namespace level02
$ kubectl hns tree level01
level01

$ kubectl hns set level02 --parent level01
Setting the parent of level02 to level01
Could not update the hierarchical configuration of level02.
Reason: create not allowed while custom resource definition is terminating

$ kubectl get customresourcedefinition,validatingwebhookconfiguration -o wide | grep hnc
customresourcedefinition.apiextensions.k8s.io/hierarchyconfigurations.hnc.x-k8s.io             2020-10-01T09:23:18Z
customresourcedefinition.apiextensions.k8s.io/hncconfigurations.hnc.x-k8s.io                   2020-10-01T09:23:18Z
customresourcedefinition.apiextensions.k8s.io/subnamespaceanchors.hnc.x-k8s.io                 2020-10-01T09:23:18Z
validatingwebhookconfiguration.admissionregistration.k8s.io/hnc-validating-webhook-configuration   5          2d

$ kubectl logs --tail 6 deploy/hnc-controller-manager -c manager -n hnc-system
{"level":"info","ts":1605197748.854099,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"}
{"level":"info","ts":1605197748.8579118,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"}
2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:62683: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.92.0:22964: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:11259: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.98.0:37281: remote error: tls: bad certificate

adrianludwin · 2020-11-12T16:34:28Z

Ahh, looks like your CRDs are in a bad state. "create not allowed while custom resource definition is terminating" is a K8s error, not an HNC error. I'd try fully deleting HNC again, but this time, make sure that the CRDs have been deleted. If they haven't, it's likely because there are some CRs that have finalizers on them. If that's the case, you can manually remove the finalizers. I've seen this the most often on subnamespaces. Say "kubectl get subns --all-namespaces" to see which ones still exist (after you've deleted the CRDs), and then "kubectl edit subns <name> -n <parent-name>" to edit it. Then you can just delete the "hnc.x-k8s.io" in the metadata.finalizers list, and the object will be deleted. The CRD itself can't be deleted until all objects of a given type are deleted first.

…

On Thu, Nov 12, 2020 at 11:18 AM Serge Hartmann ***@***.***> wrote: @yiqigao217 <https://github.com/yiqigao217> : yes, validation webhooks worked with hnc v0.5.0 @adrianludwin <https://github.com/adrianludwin> : I have deleted validatingwebhookconfigurations.admissionregistration.k8s.io/hnc-validating-webhook-configuration and re-created the hnc-system with v0.6.0. Here is the situation now : $ kubectl hns tree webs Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority $ kubectl create namespace level01 $ kubectl create namespace level02 $ kubectl hns tree level01 level01 $ kubectl hns set level02 --parent level01 Setting the parent of level02 to level01 Could not update the hierarchical configuration of level02. Reason: create not allowed while custom resource definition is terminating $ kubectl get customresourcedefinition,validatingwebhookconfiguration -o wide | grep hnccustomresourcedefinition.apiextensions.k8s.io/hierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zcustomresourcedefinition.apiextensions.k8s.io/hncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zcustomresourcedefinition.apiextensions.k8s.io/subnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Zvalidatingwebhookconfiguration.admissionregistration.k8s.io/hnc-validating-webhook-configuration 5 2d $ kubectl logs --tail 6 deploy/hnc-controller-manager -c manager -n hnc-system {"level":"info","ts":1605197748.854099,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"} {"level":"info","ts":1605197748.8579118,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"} 2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:62683: remote error: tls: bad certificate 2020/11/12 16:15:48 http: TLS handshake error from 10.233.92.0:22964: remote error: tls: bad certificate 2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:11259: remote error: tls: bad certificate 2020/11/12 16:15:48 http: TLS handshake error from 10.233.98.0:37281: remote error: tls: bad certificate — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1255 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE43PZBJMDZGEHKHNVWWW7DSPQDEXANCNFSM4TLRDRZA> .

ledroide · 2020-11-13T13:42:15Z

Solved.
TL;DR:

delete hnc manager from all-in-one manifest : kubectl delete -f hnc-manager.yaml
delete webhook configurations : kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io/hnc-validating-webhook-configuration
check what is remaining stuck : kubectl get customresourcedefinition,validatingwebhookconfiguration -o wide | grep hnc
edit manually (kubectl edit) all remaining CRDs, find finalizers: array and delete all lines, this should actually delete the CRD
re-install hnc manager kubectl apply -f hnc-manager.yaml

Details :

Before removing manually the finalizers for remaining customresourcedefinitions (after deleting hnc controller) :

$ kubectl get subns --all-namespaces
Error from server: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=SubnamespaceAnchor failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": service "hnc-webhook-service" not found

After deletion, before re-install

$ kubectl get subns --all-namespaces
Error from server (NotFound): Unable to list "hnc.x-k8s.io/v1alpha2, Resource=subnamespaceanchors": the server could not find the requested resource (get subnamespaceanchors.hnc.x-k8s.io)

After re-install :
There are some warning you should consider (I guess there is no relation with this issue, may be there is). I'm running Kubernetes 1.19.3.

Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
Warning: admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration

Now hnc controller v0.6.0 is re-created.

$ kubectl get subns --all-namespaces
NAMESPACE   NAME    AGE
webs        sbr01   20d

$ kubectl hns tree webs
webs
└── [s] sbr01

$ kubectl hns set level02 --parent level01
Setting the parent of level02 to level01
Succesfully updated 1 property of the hierarchical configuration of level02

$ kubectl hns tree level01
level01
└── level02

Logs look much better :

$ kubectl logs --tail 5 deploy/hnc-controller-manager -c manager -n hnc-system
{"level":"info","ts":1605273919.0605707,"logger":"reconcilers.Hierarchy","msg":"New namespace found","rid":242,"ns":"sbr01"}
{"level":"info","ts":1605273919.2598712,"logger":"reconcilers.RoleBinding","msg":"Propagating object","rid":257,"trigger":"sbr01/gitlab-webs-poc-webs"}
{"level":"info","ts":1605273919.2602034,"logger":"reconcilers.RoleBinding","msg":"Propagating object","rid":258,"trigger":"sbr01/sbrouet-poc-webs"}
{"level":"info","ts":1605274029.4946737,"logger":"validators.Hierarchy","msg":"Checking authz","ns":"level02","user":"gailuron","object":"level01","reason":"proposed parent"}
{"level":"info","ts":1605274029.5124876,"logger":"reconcilers.Hierarchy","msg":"Creating hierarchyconfiguration","rid":270,"ns":"level01","conditions":0}

Problem is solved. Thanks @adrianludwin

adrianludwin · 2020-11-15T21:39:01Z

Ugh, sorry you ran into so much trouble. I've filed #1270 to fix the warnings.

I'm not sure what caused the problems in the first place, but once you delete the deployment, it's not surprising that the CRD conversion webhooks fail. It's usually best to delete the CRs before the deployment because the manager is what typically removes the finalizers - but if we get into a bad enough state, it might stop doing the right thing.

Please let me know if you see anything like this again.

vikas027 · 2021-06-17T05:57:53Z

I can confirm this, the issue went away after upgrading to v0.8.0 (from 0.7.0) but I had to delete all resources and recreate them again.

Update: I think I spoke to fast, it has started throwing errors again.

adrianludwin · 2021-06-17T12:50:32Z

@vikas027 what was the prior version of HNC, was it v0.6.0 or v0.7.0? And had HNC been working despite the errors, or was it broken?

Only v0.6.0 had the CRD conversion webhooks in it (they were removed in v0.7.0) so if you saw this problem in v0.7.0, I'm leaning more towards it being a K8s issue than an HNC issue.

ledroide closed this as completed Nov 13, 2020

adrianludwin mentioned this issue Nov 15, 2020

HNC: use v1 for CRDs and webhooks #1270

Closed

adrianludwin mentioned this issue Nov 17, 2020

HNC: if (conversion?) webhooks get into a bad state, cert-rotator can no longer update their secrets #1275

Closed

vikas027 mentioned this issue Jun 18, 2021

HNC: TLS handshake error from X.X.X.X:YYYY: EOF kubernetes-sigs/hierarchical-namespaces#49

Closed

micnncim mentioned this issue Aug 2, 2021

Connection refused by tls: bad certificate kubernetes-sigs/hierarchical-namespaces#65

Closed

ledroide mentioned this issue Sep 29, 2021

HNC under Kubernetes 1.22 : "error resolving resource" kubernetes-sigs/hierarchical-namespaces#86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255

HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255

ledroide commented Nov 5, 2020

adrianludwin commented Nov 5, 2020 via email

ledroide commented Nov 6, 2020

adrianludwin commented Nov 6, 2020 via email

ledroide commented Nov 6, 2020 •

edited

Loading

adrianludwin commented Nov 6, 2020 via email

adrianludwin commented Nov 6, 2020 via email

yiqigao217 commented Nov 9, 2020

ledroide commented Nov 12, 2020

adrianludwin commented Nov 12, 2020 via email

ledroide commented Nov 13, 2020 •

edited

Loading

adrianludwin commented Nov 15, 2020

vikas027 commented Jun 17, 2021 •

edited

Loading

adrianludwin commented Jun 17, 2021

HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255

HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255

Comments

ledroide commented Nov 5, 2020

adrianludwin commented Nov 5, 2020 via email

ledroide commented Nov 6, 2020

adrianludwin commented Nov 6, 2020 via email

ledroide commented Nov 6, 2020 • edited Loading

adrianludwin commented Nov 6, 2020 via email

adrianludwin commented Nov 6, 2020 via email

yiqigao217 commented Nov 9, 2020

ledroide commented Nov 12, 2020

adrianludwin commented Nov 12, 2020 via email

ledroide commented Nov 13, 2020 • edited Loading

adrianludwin commented Nov 15, 2020

vikas027 commented Jun 17, 2021 • edited Loading

adrianludwin commented Jun 17, 2021

ledroide commented Nov 6, 2020 •

edited

Loading

ledroide commented Nov 13, 2020 •

edited

Loading

vikas027 commented Jun 17, 2021 •

edited

Loading