Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "app.kubernetes.io/version" label added to collector deployment selector cause reconciliation error #840

Closed
DWonMtl opened this issue Apr 26, 2022 · 10 comments · Fixed by #849
Labels
area:collector Issues for deploying collector bug Something isn't working

Comments

@DWonMtl
Copy link
Contributor

DWonMtl commented Apr 26, 2022

Description

A new label has been introduced recently in the deployment selector

see https://github.com/open-telemetry/opentelemetry-operator/blob/v0.49.0/pkg/collector/labels.go#L51

// Labels return the common labels to all objects that are part of a managed OpenTelemetryCollector.
func Labels(instance v1alpha1.OpenTelemetryCollector, filterLabels []string) map[string]string {
        ...

	if len(version) > 1 {
		base["app.kubernetes.io/version"] = version[len(version)-1]
	} else {
		base["app.kubernetes.io/version"] = "latest"
	}

	return base

This is perfectly legit to add a version in labels, however these labels are added to the deployment "Selector"

see https://github.com/open-telemetry/opentelemetry-operator/blob/v0.49.0/pkg/collector/deployment.go#L46

// Deployment builds the deployment for the given instance.
func Deployment(cfg config.Config, logger logr.Logger, otelcol v1alpha1.OpenTelemetryCollector) appsv1.Deployment {
	labels := Labels(otelcol, cfg.LabelsFilter())
	
	...

	return appsv1.Deployment{		
		Spec: appsv1.DeploymentSpec{
			Replicas: otelcol.Spec.Replicas,
			Selector: &metav1.LabelSelector{
				MatchLabels: labels,
			},
		},
	}
}

When the operator tries to reconcile the deployment, it face these error

{
	"level": "error",
	"ts": 1650971729.8847028,
	"logger": "controller.opentelemetrycollector",
	"msg": "Reconciler error",
	"reconciler group": "opentelemetry.io",
	"reconciler kind": "OpenTelemetryCollector",
	"name": "tracingcollectorendpoint-collector-7c46f9f1",
	"namespace": "harbour-tracing-dev",
	"error": "failed to reconcile the expected deployments: failed to apply changes: Deployment.apps \"tracingcollectorendpoint-collector-7c46f9f1-collector\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/component\":\"opentelemetry-collector\", \"app.kubernetes.io/instance\":\"harbour-tracing-dev.tracingcollectorendpoint-collector-7c46f9f1\", \"app.kubernetes.io/managed-by\":\"opentelemetry-operator\", \"app.kubernetes.io/name\":\"tracingcollectorendpoint-collector-7c46f9f1-collector\", \"app.kubernetes.io/part-of\":\"opentelemetry\", \"app.kubernetes.io/version\":\"v0.17.0\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable",
	"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.2/pkg/internal/controller/controller.go:227"
}

The reason is because the Deployment Selector is an "immutable field" and cannot be modified once created.

Since this label refers to the application version contained in OpenTelemetryCollector spec.image. each time we will change the image version, we will face this reconciliation error.

@pavolloffay pavolloffay added bug Something isn't working area:collector Issues for deploying collector labels Apr 29, 2022
@pavolloffay
Copy link
Member

pavolloffay commented Apr 29, 2022

@DWonMtl thanks for reporting. Would you like to submit a fix?

The version label was added in #797 by @yuriolisa (just pinging for FYI :)). It might have been unintentional.

@yuriolisa
Copy link
Contributor

@pavolloffay, thank you for the heads up. I can sign up for this issue.

@DWonMtl
Copy link
Contributor Author

DWonMtl commented Apr 29, 2022

@pavolloffay , indeed I can work on it.

@yuriolisa let me know if you are already working on it, if not I will start.

Also I wasn't sure about the direction your team wanted to take according to this issue. Many solution are possible.

The simplest one would be to remove the version labels, but I think this make sense to have it as a label.

One we often do in my teams for these kind of "label" and "selector" issues is to have a function for selectors that returns the "static labels" and a function for labels that happens the more volatile labels to the list of selector labels.

ex :

func Selectors(instance v1alpha1.OpenTelemetryCollector, filterLabels []string) map[string]string {
	// ...  add the static labels
}

func Labels(instance v1alpha1.OpenTelemetryCollector, filterLabels []string) map[string]string {
	base := Selectors(instance, filterLabels)
	// ... add the volatile labels
	return base
}

@yuriolisa
Copy link
Contributor

@DWonMtl, is all yours. You can start working on it.

The approach was to ensure that we have the required labels in place for our OTEL resources. However, I missed that Deployment Selector part which is causing this bug.

Thank you for reporting that.

@DWonMtl
Copy link
Contributor Author

DWonMtl commented Apr 29, 2022

ok ! on it

@ringerc
Copy link

ringerc commented May 16, 2022

I wonder if this is what's causing the failure I'm seeing too:

failed to reconcile config maps","error":"failed to reconcile the expected configmaps: failed to apply changes: ConfigMap \"otel-collector\" is invalid: metadata.labels: Invalid value: \"8f65b4d94bb5290c8fc1540703c06f7a7a12cfd917d2f141bdc8a18803828615\": must be no more than 63 characters"

See mention on #872 (comment)

My deployments sha256 container hashes to specify the container versions.

Cause looks like 0dce2df from #797

pkg/collector/labels.go

@@ -46,6 +47,12 @@ func Labels(instance v1alpha1.OpenTelemetryCollector, filterLabels []string) map
    base["app.kubernetes.io/instance"] = naming.Truncate("%s.%s", 63, instance.Namespace, instance.Name)
    base["app.kubernetes.io/part-of"] = "opentelemetry"
    base["app.kubernetes.io/component"] = "opentelemetry-collector"
+   version := strings.Split(instance.Spec.Image, ":")
+   if len(version) > 1 {
+       base["app.kubernetes.io/version"] = version[len(version)-1]
+   } else {
+       base["app.kubernetes.io/version"] = "latest"
+   }
 
    return base
 }

@DWonMtl
Copy link
Contributor Author

DWonMtl commented May 16, 2022

Indeed, if you version is a 64 char sha, this will break.

@ringerc
Copy link

ringerc commented May 17, 2022

I'll add a test to the test suite

@ringerc
Copy link

ringerc commented May 17, 2022

Filed as #879 since it's a different issue to the selector change

@ringerc
Copy link

ringerc commented May 17, 2022

You'll get the same bug when you downgrade from v0.50.0 to v0.48.0 to avoid this. You can just

kubectl delete deployment -n opentelemetry otel-collector

or similar - i.e. delete the Deployments created by the operator - to recover.

You can find all instances with

kubectl get --all-namespaces deployment -l app.kubernetes.io/managed-by=opentelemetry-operator 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector bug Something isn't working
Projects
None yet
4 participants