Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Agent and Elastic stack in different namespaces. #7382

Merged
merged 15 commits into from
Dec 19, 2023

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Dec 13, 2023

resolves #7352

What does this change?

Both root and non-root, fleet and non-fleet Agent installations were being blocked from Agent running in a different namespace because of a legacy association check. This legacy check checked that the agent was running in the same namespace as Elasticsearch. The actual issue turned out to be when Agent is not running in the same namespace as Fleet server when fleet mode is enabled. This change enables Agent to run in any namespace regardless of where Fleet or Elasticsearch is deployed.

This supersedes #7353 as this turned out to be a much simpler changeset vs:
image

Changes:

  • The association controller for Agent -> Fleet can now retrieve a slice of secrets that should be copied to a target namespace to allow secure communications between Agent => Elasticsearch.

Testing Done/Verification

  • Secret(s) are deleted upon associated object being deleted.
  • Secret(s) are watched for changes.
  • Watches are removed on delete.
  • E2e tests updated to verify both same namespace, and cross-namespace associations work.
  • Verify when source secret changes, the copied secrets changes as well

Unit tests
e2e tests

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Update and add comments to appropriate funcs/sections of code.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash to rotate the agents on changes to the ES CA is missing, which means that the Agents won't pick up the new CA until the ECK operator restarts. We can remedy this by adding the hash to the association configuration annotation like this

diff --git a/pkg/apis/common/v1/association.go b/pkg/apis/common/v1/association.go
index 8bcfaa2f2..5ecefd67b 100644
--- a/pkg/apis/common/v1/association.go
+++ b/pkg/apis/common/v1/association.go
@@ -199,12 +199,13 @@ func FormatNameWithID(template string, id string) string {
 
 // AssociationConf holds the association configuration of a referenced resource in an association.
 type AssociationConf struct {
-	AuthSecretName   string `json:"authSecretName"`
-	AuthSecretKey    string `json:"authSecretKey"`
-	IsServiceAccount bool   `json:"isServiceAccount"`
-	CACertProvided   bool   `json:"caCertProvided"`
-	CASecretName     string `json:"caSecretName"`
-	URL              string `json:"url"`
+	AuthSecretName        string `json:"authSecretName"`
+	AuthSecretKey         string `json:"authSecretKey"`
+	IsServiceAccount      bool   `json:"isServiceAccount"`
+	CACertProvided        bool   `json:"caCertProvided"`
+	CASecretName          string `json:"caSecretName"`
+	AdditionalSecretsHash string `json:"additionalSecretsHash"`
+	URL                   string `json:"url"`
 	// Version of the referenced resource. If a version upgrade is in progress,
 	// matches the lowest running version. May be empty if unknown.
 	Version string `json:"version"`
diff --git a/pkg/controller/association/controller/agent_fleetserver.go b/pkg/controller/association/controller/agent_fleetserver.go
index 91474ecad..ba51acf49 100644
--- a/pkg/controller/association/controller/agent_fleetserver.go
+++ b/pkg/controller/association/controller/agent_fleetserver.go
@@ -17,6 +17,7 @@ import (
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/association"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/operator"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/utils/k8s"
+	ulog "github.com/elastic/cloud-on-k8s/v2/pkg/utils/log"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/utils/rbac"
 )
 
@@ -46,11 +47,12 @@ func AddAgentFleetServer(mgr manager.Manager, accessReviewer rbac.AccessReviewer
 	})
 }
 
-func addtionalSecrets(c k8s.Client, assoc commonv1.Association) ([]types.NamespacedName, error) {
+func addtionalSecrets(ctx context.Context, c k8s.Client, assoc commonv1.Association) ([]types.NamespacedName, error) {
+	log := ulog.FromContext(ctx)
 	associated := assoc.Associated()
 	var agent agentv1alpha1.Agent
 	nsn := types.NamespacedName{Namespace: associated.GetNamespace(), Name: associated.GetName()}
-	if err := c.Get(context.Background(), nsn, &agent); err != nil {
+	if err := c.Get(ctx, nsn, &agent); err != nil {
 		return nil, err
 	}
 	fleetServerRef := assoc.AssociationRef()
@@ -58,7 +60,7 @@ func addtionalSecrets(c k8s.Client, assoc commonv1.Association) ([]types.Namespa
 		return nil, nil
 	}
 	fleetServer := agentv1alpha1.Agent{}
-	if err := c.Get(context.Background(), fleetServerRef.NamespacedName(), &fleetServer); err != nil {
+	if err := c.Get(ctx, fleetServerRef.NamespacedName(), &fleetServer); err != nil {
 		return nil, err
 	}
 
@@ -74,14 +76,17 @@ func addtionalSecrets(c k8s.Client, assoc commonv1.Association) ([]types.Namespa
 
 	// if both agent and ES are same namespace no copying needed
 	if agent.GetNamespace() == esAssociation.GetNamespace() {
+		log.V(1).Info("no additional secrets because same namespace")
 		return nil, nil
 	}
 
 	conf, err := esAssociation.AssociationConf()
 	if err != nil {
+		log.V(1).Info("no additional secrets because no assoc conf")
 		return nil, err
 	}
 	if conf == nil || !conf.CACertProvided {
+		log.V(1).Info("no additional secrets because conf nil or no CA provided")
 		return nil, nil
 	}
 	return []types.NamespacedName{{
diff --git a/pkg/controller/association/dynamic_watches.go b/pkg/controller/association/dynamic_watches.go
index 2245543dd..a7dabb3eb 100644
--- a/pkg/controller/association/dynamic_watches.go
+++ b/pkg/controller/association/dynamic_watches.go
@@ -5,6 +5,7 @@
 package association
 
 import (
+	"context"
 	"fmt"
 
 	"k8s.io/apimachinery/pkg/types"
@@ -49,7 +50,7 @@ func additionalSecretWatchName(associated types.NamespacedName) string {
 // * if there's an ES user to create, watch the user Secret in ES namespace
 // All watches for all given associations are set under the same watch name and replaced with each reconciliation.
 // The given associations are expected to be of the same type (e.g. Kibana -> Elasticsearch, not Kibana -> Enterprise Search).
-func (r *Reconciler) reconcileWatches(associated types.NamespacedName, associations []commonv1.Association) error {
+func (r *Reconciler) reconcileWatches(ctx context.Context, associated types.NamespacedName, associations []commonv1.Association) error {
 	managedElasticRef := filterManagedElasticRef(associations)
 	unmanagedElasticRef := filterUnmanagedElasticRef(associations)
 
@@ -103,7 +104,7 @@ func (r *Reconciler) reconcileWatches(associated types.NamespacedName, associati
 		if err := ReconcileGenericWatch(associated, managedElasticRef, r.watches.Secrets, additionalSecretWatchName(associated), func() ([]types.NamespacedName, error) {
 			var toWatch []types.NamespacedName
 			for _, association := range associations {
-				secs, err := r.AdditionalSecrets(r.Client, association)
+				secs, err := r.AdditionalSecrets(ctx, r.Client, association)
 				if err != nil {
 					return nil, err
 				}
diff --git a/pkg/controller/association/reconciler.go b/pkg/controller/association/reconciler.go
index c72298b49..a552873d2 100644
--- a/pkg/controller/association/reconciler.go
+++ b/pkg/controller/association/reconciler.go
@@ -7,6 +7,7 @@ package association
 import (
 	"context"
 	"fmt"
+	"hash/fnv"
 	"reflect"
 	"time"
 
@@ -64,7 +65,7 @@ type AssociationInfo struct { //nolint:revive
 	// AdditionalSecrets are additional secrets to copy from an association's namespace to the associated resource namespace.
 	// Currently this is only used for copying the CA from an Elasticsearch association to the same namespace as
 	// an Agent referencing a Fleet Server.
-	AdditionalSecrets func(c k8s.Client, assoc commonv1.Association) ([]types.NamespacedName, error)
+	AdditionalSecrets func(context.Context, k8s.Client, commonv1.Association) ([]types.NamespacedName, error)
 	// Labels are labels set on all resources created for association purpose. Note that some resources will be also
 	// labelled with AssociationResourceNameLabelName and AssociationResourceNamespaceLabelName in addition to any
 	// labels provided here.
@@ -191,7 +192,7 @@ func (r *Reconciler) Reconcile(ctx context.Context, request reconcile.Request) (
 	}
 
 	// reconcile watches for all associations of this type
-	if err := r.reconcileWatches(associatedKey, associations); err != nil {
+	if err := r.reconcileWatches(ctx, associatedKey, associations); err != nil {
 		return reconcile.Result{}, tracing.CaptureError(ctx, err)
 	}
 
@@ -263,13 +264,14 @@ func (r *Reconciler) reconcileAssociation(ctx context.Context, association commo
 		return commonv1.AssociationPending, err // maybe not created yet
 	}
 
+	secretsHash := fnv.New32a()
 	if r.AdditionalSecrets != nil {
-		additionalSecrets, err := r.AdditionalSecrets(r.Client, association)
+		additionalSecrets, err := r.AdditionalSecrets(ctx, r.Client, association)
 		if err != nil {
 			return commonv1.AssociationPending, err // maybe not created yet
 		}
 		for _, sec := range additionalSecrets {
-			if err := copySecret(ctx, r.Client, r.AssociationInfo, k8s.ExtractNamespacedName(association.Associated()), association.GetNamespace(), sec); err != nil {
+			if err := copySecret(ctx, r.Client, secretsHash, k8s.ExtractNamespacedName(association.Associated()), association.GetNamespace(), sec); err != nil {
 				return commonv1.AssociationPending, err
 			}
 		}
@@ -294,10 +296,11 @@ func (r *Reconciler) reconcileAssociation(ctx context.Context, association commo
 
 	// construct the expected association configuration
 	expectedAssocConf := &commonv1.AssociationConf{
-		CACertProvided: caSecret.CACertProvided,
-		CASecretName:   caSecret.Name,
-		URL:            url,
-		Version:        ver,
+		CACertProvided:        caSecret.CACertProvided,
+		CASecretName:          caSecret.Name,
+		AdditionalSecretsHash: fmt.Sprint(secretsHash.Sum32()),
+		URL:                   url,
+		Version:               ver,
 	}
 
 	if r.ElasticsearchUserCreation == nil {
diff --git a/pkg/controller/association/reconciler_test.go b/pkg/controller/association/reconciler_test.go
index 678f47977..adcaab23a 100644
--- a/pkg/controller/association/reconciler_test.go
+++ b/pkg/controller/association/reconciler_test.go
@@ -341,7 +341,7 @@ func TestReconciler_Reconcile_NoESRef_Cleanup(t *testing.T) {
 	require.NotEmpty(t, kb.Annotations[kb.EsAssociation().AssociationConfAnnotationName()])
 	r := testReconciler(&kb, &kibanaUserInESNamespace, &kibanaUserInKibanaNamespace, &esCertsInKibanaNamespace)
 	// simulate watches being set
-	require.NoError(t, r.reconcileWatches(k8s.ExtractNamespacedName(&kb), []commonv1.Association{kb.EsAssociation()}))
+	require.NoError(t, r.reconcileWatches(context.Background(), k8s.ExtractNamespacedName(&kb), []commonv1.Association{kb.EsAssociation()}))
 	require.NotEmpty(t, r.watches.Secrets.Registrations())
 	require.NotEmpty(t, r.watches.ReferencedResources.Registrations())
 
diff --git a/pkg/controller/association/secret.go b/pkg/controller/association/secret.go
index 455a6c7e9..92fea27ce 100644
--- a/pkg/controller/association/secret.go
+++ b/pkg/controller/association/secret.go
@@ -11,6 +11,7 @@ import (
 	"crypto/x509"
 	"encoding/json"
 	"fmt"
+	"hash"
 	"net/http"
 
 	corev1 "k8s.io/api/core/v1"
@@ -19,11 +20,11 @@ import (
 
 	commonv1 "github.com/elastic/cloud-on-k8s/v2/pkg/apis/common/v1"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/certificates"
+	commonhash "github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/hash"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/reconciler"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/common/version"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/client"
 	"github.com/elastic/cloud-on-k8s/v2/pkg/utils/k8s"
-	"github.com/elastic/cloud-on-k8s/v2/pkg/utils/maps"
 )
 
 const (
@@ -194,16 +195,15 @@ func filterManagedElasticRef(associations []commonv1.Association) []commonv1.Ass
 }
 
 // copySecret will copy the source secret to the target namespace adding labels from the associated object to ensure garbage collection happens.
-func copySecret(ctx context.Context, client k8s.Client, info AssociationInfo, associated types.NamespacedName, targetNamespace string, source types.NamespacedName) error {
+func copySecret(ctx context.Context, client k8s.Client, secHash hash.Hash, associated types.NamespacedName, targetNamespace string, source types.NamespacedName) error {
 	var original corev1.Secret
 	if err := client.Get(ctx, source, &original); err != nil {
 		return err
 	}
 	var expected corev1.Secret
 	expected.Data = original.Data
-	// We merge the original labels with the associated object's association labels to ensure
-	// that this secret is garbage collected when the associated object is deleted.
-	expected.Labels = maps.Merge(original.Labels, info.Labels(associated))
+	commonhash.WriteHashObject(secHash, original.Data)
+	expected.Labels = original.Labels
 	expected.Annotations = original.Annotations
 	expected.Name = original.Name
 	expected.Namespace = targetNamespace

pkg/controller/association/controller/agent_fleetserver.go Outdated Show resolved Hide resolved
pkg/controller/association/secret.go Outdated Show resolved Hide resolved
pkg/controller/association/controller/agent_fleetserver.go Outdated Show resolved Hide resolved
pkg/controller/association/reconciler.go Outdated Show resolved Hide resolved
Add additional unit test.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@barkbay barkbay added the >enhancement Enhancement of existing functionality label Dec 14, 2023
@botelastic botelastic bot removed the triage label Dec 14, 2023
@botelastic botelastic bot removed the triage label Dec 14, 2023
pkg/apis/common/v1/association.go Outdated Show resolved Hide resolved
pkg/controller/association/reconciler.go Outdated Show resolved Hide resolved
pkg/controller/association/secret.go Outdated Show resolved Hide resolved
expected.Name = original.Name
expected.Namespace = targetNamespace

_, err := reconciler.ReconcileSecret(ctx, client, expected, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we could use a soft owner to garbage collect this Secret.

Suggested change
_, err := reconciler.ReconcileSecret(ctx, client, expected, nil)
_, err := reconciler.ReconcileSecretNoOwnerRef(ctx, client, expected, softOwner)

softOwner being referencedObj IIUC, and maybe use GarbageCollectSoftOwnedSecrets to remove them (not tested, only an idea).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think garbage collection is already tied to the association, in this case the Fleet Server->ES association. This has the odd side effect that the secret stays if the Agent->FleetServer association is severed but the Fleet Server->ES association still exists. I was thinking that this would be sufficient though because typically you would deploy Fleet server and Agent in tandem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the odd side effect that the secret stays if the Agent->FleetServer association is severed but the Fleet Server->ES association still exists.

Yes, this is what I was trying to "fix" with this suggestion. Not a strong opinion though, we can stick with the current behavior 👍

pkg/apis/common/v1/association.go Outdated Show resolved Hide resolved
pkg/controller/association/dynamic_watches.go Outdated Show resolved Hide resolved
return commonv1.AssociationPending, err // maybe not created yet
}
for _, sec := range additionalSecrets {
if err := copySecret(ctx, r.Client, secretsHash, association.GetNamespace(), sec); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to watch this Secret? It's not reconciled if I delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it doesn't return, but the agent-es association controller is triggered on delete. I wonder if this:
https://github.com/naemono/cloud-on-k8s/blob/5dd7bb85550fd0d0b23af1762803025e725bf2d3/pkg/controller/association/dynamic_watches.go#L103-L117
is associating the secret incorrectly? I'll investigate and update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were only watching the source secrets (the secrets we were copying into the target ns) and not the target/created secrets. It seems we would need to watch both:

  1. if source secret changes, update target secret.
  2. if target secret is changed/deleted, reconcile the secret.

I have updated and verified that both the target and source are re-created upon deletion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we have a similar issue with the fleet server es user/service token secret. Probably worth addressing in a separate PR though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised we have captured my observation in an issue already #7170

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment, #7353 (review), still valid?

I think we also need to adjust the documentation, ECK 2.10.0 does not allow Agent and ES to run in different namespaces.

Update pipeline for memory exhaustion during lint.
Review comments.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
…retsHash.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@naemono
Copy link
Contributor Author

naemono commented Dec 14, 2023

buildkite test this -f p=gke,s=8.11.0

@pebrc
Copy link
Collaborator

pebrc commented Dec 18, 2023

#7382 (comment) could still be addressed but otherwise LGTM

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@naemono
Copy link
Contributor Author

naemono commented Dec 18, 2023

#7382 (comment) could still be addressed but otherwise LGTM

Updated to only include the additional secrets Hash if there are additional secrets.

@naemono naemono requested review from barkbay and pebrc December 18, 2023 16:04
@naemono
Copy link
Contributor Author

naemono commented Dec 18, 2023

Fixing unit tests...

…nal secrets exist.

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@naemono
Copy link
Contributor Author

naemono commented Dec 18, 2023

Fixing unit tests...

Unit tests fixed.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just the docs needs an update I think to call out Elasticsearch specifically

pkg/controller/association/reconciler_test.go Outdated Show resolved Hide resolved
pkg/controller/association/reconciler_test.go Outdated Show resolved Hide resolved
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@naemono naemono merged commit c8f6386 into elastic:main Dec 19, 2023
6 checks passed
@naemono naemono deleted the 7352-agent-non-root-across-namespaces-v2 branch December 19, 2023 16:06
robbavey pushed a commit to robbavey/cloud-on-k8s that referenced this pull request Dec 20, 2023
* Allow Agent, Fleet and Elastic stack in different namespaces.
* Documentation updates for feature.
---------

Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v2.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running Agent as non root in a dedicated namespace is not possible
3 participants