Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(proposal): manifest hydrator #17755

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

crenshaw-dev
Copy link
Member

@crenshaw-dev crenshaw-dev commented Apr 5, 2024

Split out of #17506 to be standalone.

WIP code: #19066

Project to track work: https://github.com/orgs/argoproj/projects/33

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@crenshaw-dev crenshaw-dev requested review from a team as code owners April 5, 2024 17:24
@crenshaw-dev crenshaw-dev marked this pull request as draft April 5, 2024 17:25
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this. Good idea on this proposal.

docs/proposals/manifest-hydrator.md Outdated Show resolved Hide resolved
docs/proposals/manifest-hydrator.md Show resolved Hide resolved
docs/proposals/manifest-hydrator.md Outdated Show resolved Hide resolved
}
```

### Use cases
Copy link
Member

@agaudreault agaudreault Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Describe behavior and configuration for different common usage of ArgoCD
    • ApplicationSet defined in Git
    • Applications generated or defined in Git (plain k8s manifest or synced Helm chart)
    • Applications created/managed dynamically
    • Monorepo of applications vs single repo
    • multiple path in same hydrated branch.

docs/proposals/manifest-hydrator.md Show resolved Hide resolved
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal 🚀

docs/proposals/manifest-hydrator.md Show resolved Hide resolved
docs/proposals/manifest-hydrator.md Outdated Show resolved Hide resolved
docs/proposals/manifest-hydrator.md Show resolved Hide resolved
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@crenshaw-dev crenshaw-dev marked this pull request as ready for review July 15, 2024 19:37
@igaskin
Copy link
Member

igaskin commented Jul 21, 2024

Is there overlap between the hydrator and kargo-render? https://github.com/akuity/kargo-render

@crenshaw-dev
Copy link
Member Author

Thank you so much @igaskin for the thorough review! There's a WIP branch in case you're interested in testing/tinkering: #19066

Is there overlap between the hydrator and kargo-render? https://github.com/akuity/kargo-render

There is! The Argo CD hydrator will provide an opinionated subset of kargo-render functionality. As a first-class component, I think that subset will have a somewhat nicer UX. I've chatted briefly with @alexmt about the possibility of kargo-render taking advantage of the new API features so that kargo-render can continue to provide all its features while requiring less code.

spec:
# The sourceHydrator field is mutually-exclusive with `source` and with `sources`. If this field is configured, we
# should either throw an error or ignore the other two.
sourceHydrator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting aside the question of whether or not this should be a separate project vs. in-tree for now, is there a strong reason do the hydration fields need to be introduced into the Application spec itself as opposed to a CRD that references the Application?

In today's contributor meeting, there was mention of field duplication as one reason and the problem of keeping the values in-sync, but I feel this would not be an issue if a hydration CRD could simply reference the Application spec so there is still single source of truth (the App spec.source).

Second, in the demo, it was shown that in order to use the feature, a user would have to rename source to sourceHydrator, then add drySource + syncSource to the spec. This means there's no easy migration path for users since they would then need to regenerate their application specs to turn this on. A separate CRD solves this since this feature can be turned on and off more easily. In other words, they would create the CR to turn it on, delete the CR to turn it off, without touching the original Application.

Another concern I have is that this feature will inevitably need more and more status/bookkeeping about hydration, which will then spill into the Application status itself. So in addition to health status, sync status, then there will need to be hydration status and all the details there.

I just feel that the concerns of hydration (writing) to a repo and syncing (reading) from a repo have a natural clean separation, both logically as well as technically. So I feel we can & should preserve that separation as separate CRDs.

Copy link
Member Author

@crenshaw-dev crenshaw-dev Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think embedding sourceHydrator in the Application CR has some crucial advantages:

  1. UX: It provides individual developers with easy/intuitive source hydration via existing and familiar interfaces
  2. Architecture: It's consistent with Argo CD’s existing CR design
  3. Maintenance: It takes advantage of Argo CD’s existing API/UI code and permissions structure

I think there are two intuitive options for implementing this as a separate CRD: A) SourceHydrator that references an Application or B) an Omnibus SourceHydrator that handles multiple apps. I'll answer both options.

A) SourceHydrator references Application

This option would probably look like this:

kind: SourceHydrator
spec:
  application: my-app
  targetRevision: main
  path: envs/dev/west
---
kind: Application
spec:
  source:
    repoURL:
    targetRevision: env/dev
    path: west

1. Impacts on UX

  1. This requires maintaining an additional resource, which could be a pain for end users
  2. The SourceHydrator spec isn't super intuitive, because you don't know at a glance the repoURL, targetRevision, path, or project; you have to inspect two resources
  3. The Application source field becomes confusing, because not all features are actually available.
    • targetRevision: normally you can use a commit tag, hash, or semver range as the target revision. But the hydrator as currently designed cannot push to a commit tag, and it fundamentally cannot push to a hash or semver range (those would be nonsensical destinations, technically allowed by the spec).
    • chart, helm, kustomize, and directory.jsonnet: none of these fields are meaningful when using rendered manifests, and using some will throw runtime errors. None of those restrictions will be clear by looking at the spec.
    • plugin and directory: using either of these fields defeats much of the point of using rendered manifests, because it applies last-mile transformations that aren't recorded to git. This downside won't be immediately apparent to the user.

2. Impacts on Architecture

Having a "flat config" is consistent with how Argo CD has built most of its features. If there's a 1:1 relationship between some configuration and the Application CR, then we've embedded that configuration (multi-source support is a good example).

This design is in contrast with how Flux has done things. By breaking parts of their API into separate CRDs, they've simplified the mental model from an implementation perspective. But I think that users have appreciated Argo's approach of "you write one file, and you're done." Denormalization is a big UX win.

Traditionally in Argo CD if there is some configuration which is "about" an application, it is stored in the Application CR. We believe that individual users do and should care about manifest hydration. Our experience with the rendered manifest pattern at Intuit is that if the hydration process is "hidden" from the user, it results in confusion and frustration.

Having a separate CRD for Application-specific config would be possible. But it would be a departure from the pattern that's been very successful so far.

3. Impacts on Maintenance

Using a separate CRD would require new API/UI/CLI code tailored to handling the new CR. For example, if we want to let users update the SourceHydrator config in the UI, we'll need either a new API to handle those updates or to expand the existing API to handle the additional resource. It's more code and complexity to maintain and reason about. All these are big reasons ApplicationSets still lacks a robust UI/CLI experience.

Security controls would become more difficult to think about. For example, if we want to use the Application's AppProject to restrict destination repositories and allowed app namespaces, we'd need code to pull the Application config, pull it's AppProject config, and then apply the rules. In a single CR, we could just call the existing functions to enforce those rules. To enforce apps-in-any-namespace restrictions, the app controller currently has code to filter out apps in disallowed namespaces. If we used a separate CRD, we'd have to reimplement the same logic in a different controller and either duplicate the restrictions config (two env vars) or write additional code to pull the restrictions from some common config location.

B) Omnibus SourceHydrator

Another option would be to write a CRD that handles hydration completely independently of applications, maybe even hydrating for multiple applications based on patterns. For example:

kind: OmnibusSourceHydrator
spec:
  repoURL: https://git.example.com/org/repo
  targetRevision: main
  targetBranches:
  - name: env/dev
    paths:
    - source: envs/dev/west
      destination: west
    - source: envs/dev/east
      destination: east
# etc.

This has some advantages. For example, you can look in one place to understand how all the hydration works for an app.

But you still have all the same downsides as described above.

You also have the problem that this type of omnibus config is difficult for individual users to deal with. As a new resource without a 1:1 app relationship, we'd need a new API and RBAC to provide a strong UI experience (similar to AppSets). In practice, it will probably be maintained by some ops team and become opaque to the developer. We believe the developer should fully understand and be in full control of hydration, because dry manifests are their interface, and hydrated manifests are what gets applied to the cluster. The user cares about both.

This resource also just obscures the inherent 1:1 relationship. One hydrated manifest always has one hydrated source. By listing them all in one file, you just add one layer of obfuscation between the user and the hydration process.

Answers to specific concerns

there's no easy migration path for users since they would then need to regenerate their application specs to turn this on

The Application source has to be modified either way. If the SourceHydrator is configured separately, the user still has to change the spec.source field to point to the hydrated instead of the DRY source. They'd also need to remove incompatible fields like kustomize, helm, etc. (if they know that they need to do so, since it's not obvious from the spec).

this feature will inevitably need more and more status/bookkeeping about hydration, which will then spill into the Application status itself

It will, but since hydration is "about" the Application, this is a reasonable place to do that bookkeeping. I’m not sure what the ultimate concern is here. Storing information about the app is what the status field is for.

the concerns of hydration (writing) to a repo and syncing (reading) from a repo have a natural clean separation, both logically as well as technically

They have a logical and clean separation in the same way that “pulling from a repo” and “syncing to Kubernetes” do: they’re logically separate, but they’re both config “about” the Application. The fact that they're very separate tasks makes it easy to write code to handle the two parts separately, just like we do for “pulling” and “syncing” today.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the source hydrator should be part of the Application spec rather than another CRD for most of the reasons outlined, plus one more: references create further lifecycle concerns that need to be taken care of. For example, do you delete the hydrator or the application first? Do you put a finalizer on one or the other? Is one owned by the other? While these aren't difficult problems to solve and there's prior art within the project, it's one more thing to consider for both maintainers and users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support including the source hydrator in the Application spec instead of creating a separate CRD, primarily for the reasons outlined above.

Additionally, our current Application spec already contains a similar scenario with our Multi-Source features sources field. In multi-source applications, we disregard the source field when sources is provided in the action spec. This rule is something we follow across UI / CLI / API.

We could apply the same logic with sourceHydrator—for instance, if sourceHydrator is specified, we could either ignore the source and sources fields or trigger an error, if necessary.

Comment on lines +149 to +151
hydrateTo:
targetBranch: environments/e2e-next
# The path is assumed to be the same as that in syncSource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echo-ing @jessesuen a little bit here but have additional thoughts so I figure I'd start another thread on this.

The more and more I think about this the more I am leaning towards it being a separate CRD. ApplicationSets set the precedent. It's a separate controller with it's own CRD that had the scope of "just an Application generating factory". Similarly, this can be thought of in the same light.

From my PoV, I can think of two things...

  1. Have a Hydrator CRD with the same drySource and hydrateTo spec but the end result is that the Hydrator Controller generates an Argo CD Application using the hydrateTo as it's source
  2. (^ related but also separate thought) If going down this path, maybe just make it another ApplicationSet generator? "Hydrator Generator" with the scope of it bing "a generator that just generates one App".

Just my brainstorming thoughts.

Copy link
Member Author

@crenshaw-dev crenshaw-dev Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will respond to Jesse ASAP!

I don't think ApplicationSet is a suitable comparison. ApplicationSet has a 1:N relationship with Applications, so it fundamentally can't be embedded in a single app. This unavoidable aspect of the tool is why it required a separate CRD. But using a separate CRD has had numerous (unavoidable, in the case of AppSets) downsides: mainly that we've had to introduce new APIs and RBAC to provide basic first-class support and that we still don't have full UI/CLI support because dealing with a completely new CRD comes with a lot of overhead.

Source hydration has a 1:1 relationship with Applications, because 1) it's fundamentally something that an individual developer cares about on an individual app level, and 2) a single hydrated source always has a single dry source. If I'm a developer maintaining my one app in a big organization, I should have the ability to easily use the rendered manifest pattern without going to an Ops team to configure some separate resource for me.

As a feature with a 1:1 relationship to an Application, I think hydration is similar to multi-source apps: it's an opt-in feature, but if you want to use it, it's configured the same place as all your other app config. That's why we introduce multi-source apps as an "unstable," opt-in feature on the Application CR. I think that's a big part of why after just a few release cycles, it has completely first-class support.

Have a Hydrator CRD with the same drySource and hydrateTo spec but the end result is that the Hydrator Controller generates an Argo CD Application using the hydrateTo as it's source

This suffers from all the problems of introducing any new CRD, but it also has its own problems. This is what the spec would probably look like:

kind: HydratedApplication
spec:
  drySource:
    repoURL:
    targetRevision:
    path:
  syncSource:
    targetBranch:
    path:
  template:
    # ... insert every Argo CD field except `source` and `sources`

This weirdly gives the controller two completely separate tasks: 1) do manifest hydration, and 2) don't let the user set the source or sources field. So half of the controller's job becomes this unusual "enforcement" task. And users will encounter all the pain points they encounter with ApplicationSets: they don't know why their UI changes keep getting overwritten suddenly, and they don't have easy access to modify the higher-level (hydrator) config. We'd probably eventually have to implement AppSet-like features such as ignoreApplicationDifferences, sync policies, etc. as well as implement new RBAC/UI/API code.

(^ related but also separate thought) If going down this path, maybe just make it another ApplicationSet generator? "Hydrator Generator" with the scope of it bing "a generator that just generates one App".

I think this suffers from the same problems as the HydratedApplication and also introduces a strange new task for the ApplicationSet controller of hydrating manifests and pushing them to git. It's just not at all what the AppSet controller is designed to do.

tl;dr: I think manifest hydration is fundamentally something that individual developers care about at an individual app level, and it therefore makes sense to provide a really solid first-class experience via the already-robust Application API.

Copy link
Member Author

@crenshaw-dev crenshaw-dev Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick follow-up: when we considered multi-source support, we also considered whether you could solve the problem via ApplicationSet. In that case, it's technically possible, and even fits within AppSet's existing functionality (just use a git files generator and template the values file into sources.helm.values).

We decided that having a first-class UX via the Application spec was a better move than using the AppSet approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used ApplicationSet heavily and I don't think the two feature are comparable for the same reasons explained above.

In all the features of argo, multi-source seems like the closest comparable feature: they both impact what comes in and what goes out, they both have different behavior/spec related to hydration.

On the API/CRD level, they both bring the same concerns:

  • when sources is used, source cannot. => when sourceHydrator is used, sources and source cannot.
  • UI change were required for the UI, and it was done iteratively through the versions without impact for non-multisource users
  • CLI changes were required and it was done iteratively
  • and so on.

I think the main difference here, is the technical complexity to implement the feature. The implementation of source hydration shouldn't have a negative effect on the current controllers. Most of the logic for source hydration would benefit from being executed in other micro-services, and not in the application-controller - which is already complex today - as much as possible.

But from an API design point-of-view, I think it makes sense to have these configuration as part of the Application natively. Now I think we should give more thoughts to these configuration, and I will start another thread for those like @christianh814 did 😄

spec:
# The sourceHydrator field is mutually-exclusive with `source` and with `sources`. If this field is configured, we
# should either throw an error or ignore the other two.
sourceHydrator:
Copy link
Member

@agaudreault agaudreault Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crenshaw-dev @jessesuen @christianh814
Now thinking about it, perhaps the way it is added to the spec could be streamlined. I am thinking of something like the following:

spec:
  project: default
  source:
    repoURL: https://github.com/argoproj/argocd-example-apps
    targetRevision: main
    path: environments/e2e
    hydration:
      syncSource:
        targetBranch: environments/e2e
        path: .
      hydrateTo:
        targetBranch: environments/e2e-next

I don't know if it would work, but it would

  • be an easy migration for users currently using the dry pattern. They only need to add the hydration config
  • Fit the current "rendering" tools pattern
  • Smaller impact on CRD. No need to check for conflict with source or sources.
  • Explicitly no support for multi-sources this way, but it could be added with minimal changes after that if it makes sense.
  • Benefit of the current manifest-generate-paths annotation out of the box.

Now the hydrator becomes more of a CMP on steroids that offload the job to push to the new commit-server components, but it cannot be implemented as a CMP because it will need to watch for changes on source.hydration.syncSource.targetBranch so I think we get a big benefit from implementing this as a built-in tool.

A naive way to think about it is:

  • application-controller performs reconcile (source.targetRevision is modified)
  • repo-server is called to perform rendering
  • CMP receives manifest at source.targetRevision
  • CMP hydrates them based on the .argocd-source.yaml config
    • the logic to hydrate is in the repo-server, so it would call the repo-server API
  • CMP calls commit-server
  • commit-server commits hydrated manifest to source.hydration.hydrateTo.targetBranch
  • CMP pulls the manifests at source.hydration.syncSource.targetBranch
    • the logic to pull from git is in the repo-server so it would call the repo-server API
  • CMP returns these manifests instead of the hydrated ones 👿 (can still be the same if hydrateTo.targetBranch and syncSource.targetBranch are the same)
  • application-controller syncs these manifests

Because of the back and forth in repo-server and the existing logic of the rendering tools, it makes sense to have the logic described by "CMP" above implemented part of the repo-server IMO.

  • it keeps doing the same thing it is doing today: pulling from git and rendering manifests.
  • the repository-write access is isolated to another component
  • the only performance impact is an additional pull and push from git, which are I/O operations.

It provides a similar pattern that can be done with kargo-render today (also using the repo-server under the hood), but natively within Argo CD, with simpler configuration and a better experience for developer since this patter will now be integrated natively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this and then relooking at the spec, I think your (@agaudreault) draft spec thoughts makes the most sense to me.

spec:
  source:
    repoURL: https://github.com/argoproj/argocd-example-apps
    targetRevision: main
    path: environments/e2e
    hydration:
      syncSource:
        targetBranch: environments/e2e
        path: .

This way, my gut reaction, feels more "natural" to me on first read. It streamlines it and reads "take this source, hydrate it, and have Argo CD sync it from the hydrated branch"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two suggested changes here:

  1. Use repo-server to initiate the git push (delegated to a separate commit-server component)
  2. Nest hydrator config under the source field

To answer the first, I think there are a few downsides to a side-effect based write-back system in the repo-server. But I think the deal-breaker is the lack of a way to aggregate hydration across multiple apps. The current hydrator implementation (in the app controller) finds all apps which hydrate to a specific repo/branch pair, collects their hydrated manifests, and pushes those manifests in a single git commit. The repo-server currently operates only on a per-app basis. To aggregate apps for single-commit hydration, we'd have to make the repo-server multi-app-aware.

There are other downsides. If aggregation doesn't sound like a deal-breaker, we can dive into other issues and see if they're surmountable.

To answer the second suggestion: I like the simplicity, especially of the migration experience. But I think there are some downsides here, too. I'll split my concerns into two categories: UX and backend.

UX Concerns

1. spec.source.chart isn't obviously invalid

The hydrator currently only supports git sources. The nested hydrator API makes it look as if maybe we support Helm sources as well.

source:
  repoURL: https://helm.example.com/my-chart
  chart: my-chart # at a glance, looks like this should work, right?
  hydration:
    syncSource: 

It's possible that we'll support Helm charts in the future (though I kind of doubt it, since deterministic hydration is a key aspect of this feature, and the only possible deterministic hydration of a Helm chart is one in which there's no user input).

2. Lack of parallelism with helm, kustomize, and directory

As the user reads an example application.yaml, they'll encounter the helm, kustomize, and directory fields under spec.source. In the case of each of those fields, the behavior they provide is basically "pass some parameters to the hydration tool." If we add a hydration field at the same level, we break that pattern. Now instead of just configuring inputs for a build tool, we're configuring an entire write-back system. It's just a surprising deviation from the current pattern.

3. Not obviously mutually-exclusive with helm, kustomize, and directory

One nice thing about a new top-level sourceHydrator field is that it's obvious how new and different it is from the source API. If we use the source API, the user might think that this is possible:

source:
  hydration: # all my hydrator config
  kustomize: # my Kustomize overrides

We have a similar problem today where folks want to configure helm and kustomize inputs at the same time. But in that case it's a bit more intuitive why they might be mutually-exclusive: you either run helm or you run kustomize. In the case of the hydrator, a user might reasonably wonder "why can't I run the hydrator with some Kustomize inputs"? There's a good answer having to do with deterministic hydration, but it's not obvious from the API that we're enforcing that opinion (until you try it and get a runtime failure).

Backend Concerns

I'm generally strongly opposed to letting the backend dictate the API. The user experience should be the ultimate concern.

But in this case, I think there are good reasons to believe having a top-level sourceHydrator field is a better UX. So maybe backend concerns can serve as a tie-breaker.

1. spec.source is basically a repo-server API request body

The app-controller and repo-server basically split concerns today like this: spec.source is the repo-server's concern, and everything else is an app-controller concern. So the contents of spec.source are literally an API request body to the repo-server.

If hydration is not performed by the repo-server (reasons why given above), then it doesn't really make sense to send the hydration config to the repo-server. And doing so could entice maintainers to do silly things in the future (for example, to render manifests differently depending on their destination repoURL/branch/path).

We could have the app-controller clear out the hydration field before sending the source field to repo-server, but it would be strange code to write/test/maintain.

2. Multi-source would be weird, or types would be duplicated

Today spec.source is unmarshaled to a struct of type ApplicationSource. spec.sources is type []ApplicationSource. It's unclear what it would mean to have multiple sources in a single app hydrating to different destinations. Conceptually it seems possible (hydrate to all, then coalesce the manifests into a single document for syncing). But the source hydrator as currently proposed intentionally does not work with multi-source apps.

So we'd either have to duplicate the ApplicationSource struct into a new type ApplicationSingleSource to add the hydration field, or we'd have to let the hydration field appear in multiple sources and somehow handle it there (either by implementing multi-source hydration, throwing a runtime error, or ignoring the field).

tl;dr - I think app-controller is better than repo-server to coordinate commit-server API calls, and I think a new top-level field is justified despite small UX downsides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @crenshaw-dev around Multi-Source applications. If the hydrator field is added to the ApplicationSource spec, the sources field would also take in the same change. There are multiple areas in our codebase where we pass in just a single source and do not have a mechanism to identify if the application is a single source app or a multi-source app.

Also, duplicating the ApplicationSource struct into a new type ApplicationSingleSource type might be a breaking change for folks relying on the current ApplicationSource format.

@gnunn1
Copy link

gnunn1 commented Aug 20, 2024

I'm late to the game on this but one concern I have is with regards to the opinion section:

https://github.com/crenshaw-dev/argo-cd/blob/manifest-hydrator-proposal/docs/proposals/manifest-hydrator.md#opinions

I am a bit concerned about not supporting the ability to dynamically lookup cluster information during the hydration process or at least having some sort of plugin process to enable users who wish to do this to do so. IMHO hydration is the natural place to support this sort of functionality.

The proposal states that one should use a on-cluster webhook for this but IMHO I feel like this is a poor solution for the following reasons:

a. The proposal talks a lot about how non-deterministic is bad but yet a webhook is inherently non-deterministic so it feels counter to one of the original premises of the porposal?
b. With a webhook there is no opportunity to test changes before it hits a cluster. Missed typed something and the webhook fails to substitute a value or process it? You only know when it is live tested rather then discovering the issue during the hydration process where it could be caught by a review or automated linting tools.
c. Webhooks can be problematic from a cluster reliability point of view. While writing a webhook is relatively staright forward it's not uncommon for webhooks to fail causing API server outages until the offending webhook is corrected or removed. Hydration happens off cluster and eliminates this issue.

When users are managing 100s or 1000s of clusters I see a strong desire from them to minimize the amount of per cluster customizations with individual manifests/oberlays/value files in favor of an approach that abstracts out these differences via some templating mechanism.

@crenshaw-dev
Copy link
Member Author

I'll go into detail below, but my top-level answer is this: We have to decide "how much can Argo CD do and still do it well?" In my opinion, for Argo CD to be a robust, reliable, maintainable tool, our focus should be 1) monitoring git state and 2) detecting drift from cluster state (i.e. being a GitOps operator). Monitoring additional state (k8s Secret contents, s3 buckets, etc. etc.) has two downsides: 1) it explodes the complexity of the Argo CD code base, and 2) it undermines the benefits that GitOps is supposed to provide.

not supporting the ability to dynamically lookup cluster information during the hydration process

Just to clarify the scope of this limitation: the only lookups Argo CD currently supports are Kubernetes API version and a list of resource API versions. These are only used for Helm templating. This proposal doesn't withhold support for any other existing Argo CD runtime lookups, because no others exist.

or at least having some sort of plugin process to enable users who wish to do this to do so

CMPs may still inject whatever runtime state they want (although I'd consider this bad practice).

The proposal states that one should use a on-cluster webhook

A webhook is a relatively "cheap" option, and a controller is a more robust option.

a webhook is inherently non-deterministic so it feels counter to one of the original premises of the porposal?

The argument is not that "non-determinism is bad." The argument is that hydration is the wrong place for it. From the perspective of a GitOps operator, the declaration of the user's intent ought to be static and recorded in git. Any dynamic handling of that declaration ought to be done on the cluster (i.e. by a webhook or controller).

Argo CD is the wrong tool for dynamic handling of user declarations because either Argo CD is forced to monitor non-git state and continuously reconcile those additional sources (making Argo CD's behavior complex and difficult for the user to reason about), or the user has to manually re-trigger hydration to process external state changes, and they're back in the world of ClickOps.

With a webhook there is no opportunity to test changes before it hits a cluster. Missed typed something and the webhook fails to substitute a value or process it?

If hydration is nondeterministic, there's already no way to test changes before they hit the cluster, because you don't know what's going to hit the cluster until Argo CD hydrates it. The user might feel a bit better after running helm template that everything looks fine, but they have no reason to believe that Argo CD is going to produce the same manifests when it runs helm template.

You only know when it is live tested rather then discovering the issue during the hydration process where it could be caught by a review or automated linting tools.

The review and/or the linter output are already invalid the moment you run them on the hydrated manifests, because the hydrator output may change the next moment.

The idea that you can't catch typos with a webhook is also not entirely true. If I recall correctly, webhooks are run during dry-runs. So (as long as hydration is deterministic) Argo CD can dry-run your proposed manifests and confidently tell you "these manifests will apply fine if you choose to apply them." That's something it couldn't confidently tell you if hydration were nondeterministic.

Webhooks can be problematic from a cluster reliability point of view. While writing a webhook is relatively staright forward it's not uncommon for webhooks to fail causing API server outages until the offending webhook is corrected or removed. Hydration happens off cluster and eliminates this issue.

Template code can misbehave just like webhook (or controller code) can misbehave. If it's absolutely crucial that certain config be correct, then it should be reviewed and stored in git instead of processed by a potentially-buggy webhook or mutated at hydration-time by a potentially-buggy template.

When users are managing 100s or 1000s of clusters I see a strong desire from them to minimize the amount of per cluster customizations with individual manifests/oberlays/value files in favor of an approach that abstracts out these differences via some templating mechanism.

Non-deterministic templating is a cheap and easy way of reducing duplication. But the cost is "your GitOps tool now doesn't provide a strong GitOps experience." For some that's an acceptable tradeoff, but for many it isn't.

The idea behind deterministic hydration is this:

If you need to inject runtime state, don't use a GitOps operator to do it. Use the Kubernetes tools that were designed to for that purpose: webhooks and controllers. Providing your users a good webhook/controller experience is on you, and providing a solid GitOps experience is on us. No half measures on either side.

@crenshaw-dev
Copy link
Member Author

For the purposes of assessing whether the feature should be added: this proposal wouldn't eliminate any existing Argo CD capabilities. It would just limit the behavior of the new feature.

@crenshaw-dev
Copy link
Member Author

I should also add that I think deterministic hydration is an important simplifying assumption for v1 of this feature.

I do think we could explore re-introducing non-deterministic features later. We'd want to carefully manage code complexity and UX for each feature introduced.

At this early stage, to make the hydrator as reliable and usable as possible, I think it makes sense to require determinism.

@gnunn1
Copy link

gnunn1 commented Aug 20, 2024

Thanks for the detailed reply, really appreciate it.

or at least having some sort of plugin process to enable users who wish to do this to do so

CMPs may still inject whatever runtime state they want (although I'd consider this bad practice).

This is what I and some others I'm familiar with at Red Hat are doing now, basically leveraging RHACM to do the lookups for a relatively small list of static per cluster variables and populate the information in environment variables in a CMP to do an envsubst on the manifests. It works well when Argo CD is installed on the local cluster but doesn't scale well in a centralized model. We are looking into ways to make this work better in a CMP though.

https://github.com/gnunn-gitops/acm-hub-bootstrap/blob/main/components/policies/gitops/base/manifests/gitops-instance/base/argocd.yaml#L54

The proposal states that one should use a on-cluster webhook

A webhook is a relatively "cheap" option, and a controller is a more robust option.

The issue with controllers though is often times you need to to template/substitute on resources that are already managed by controllers and there is potential for race conditions and other issues to appear. For example in OpenShift each cluster has a unique wildcard subdomain that provides a generic ingress for applications on the cluster. This subdomain needs to be templated across many different Kinds such as Certificates (cert-manager) which are already managed by controllers. A webhook provides a more optimal way to handle this IMHO but with some negatives.

a webhook is inherently non-deterministic so it feels counter to one of the original premises of the porposal?

The argument is not that "non-determinism is bad." The argument is that hydration is the wrong place for it. From the perspective of a GitOps operator, the declaration of the user's intent ought to be static and recorded in git. Any dynamic handling of that declaration ought to be done on the cluster (i.e. by a webhook or controller).

I disagree here, the user's intent in the original git repo, not the hydrated repo, is going to be dynamic with lookups whether or not the lookup happens on-cluster or in hydration. My argument is that doing it in hydration means it gets stored in git where the changes can be easily seen and approved as needed before having Argo CD deploying the new hydrated version. If a dev makes a change to the way the lookup works I can see the impact of it across all rendered manifests before deployment since it's baked into hydration.

I don't feel I get that with a controller, webhook or CMP and it's the same argument of "Why Hydration?" when it comes to just rendering Helm and Kustomize without hydration IMHO.

Argo CD is the wrong tool for dynamic handling of user declarations because either Argo CD is forced to monitor non-git state and continuously reconcile those additional sources (making Argo CD's behavior complex and difficult for the user to reason about), or the user has to manually re-trigger hydration to process external state changes, and they're back in the world of ClickOps.

I understand the point here, but this needs to be weighed against users managing 100s or 1000s of kustomize overlays/helm values to handle per cluster customizatons versus being able to use lookups. Additionally I don't feel users wanting this are expecting Argo CD to monitor these dynamic sources of information, they want to know when it was hydrated the lookup returned X and that's what got deployed. If for some reason the lookup would return something else it would be handled on the next hydration.

I think maybe the disconnect I'm having is that for me the use case that I'm thinking of is largely static information per cluster, things that either won't or very rarely change so I'm not thinking of the use case where a user wamts to use a lookup on a secret that is rotated every 7 days for example. Is this scenario what is driving a lot of your concern here as I can understand your PoV on that one?

Also how is Intuit managing this for their large fleet of clusters, maybe it is not as much of a challenge with vanilla Kubernetes as it is with OpenShift?

With a webhook there is no opportunity to test changes before it hits a cluster. Missed typed something and the webhook fails to substitute a value or process it?

If hydration is nondeterministic, there's already no way to test changes before they hit the cluster, because you don't know what's going to hit the cluster until Argo CD hydrates it. The user might feel a bit better after running helm template that everything looks fine, but they have no reason to believe that Argo CD is going to produce the same manifests when it runs helm template.

I'm not sure I'm following, basically if you hydrate the manifests in a separate git repo you can lint/test/review that new version in the rendered git repo before Argo CD deploys it onto the cluster. For me hydration is a separate and de-coupled step from deploying the manifests but I remember from a tweet you did were it looked like it might all be baked into the same flow in Argo CD? (And again maybe this goes back to the other discussion on a separate CRD and Controller for hydration).

You only know when it is live tested rather then discovering the issue during the hydration process where it could be caught by a review or automated linting tools.

The idea that you can't catch typos with a webhook is also not entirely true. If I recall correctly, webhooks are run during dry-runs. So (as long as hydration is deterministic) Argo CD can dry-run your proposed manifests and confidently tell you "these manifests will apply fine if you choose to apply them." That's something it couldn't confidently tell you if hydration were nondeterministic.

That is a fair point, I stand corrected.

Webhooks can be problematic from a cluster reliability point of view. While writing a webhook is relatively staright forward it's not uncommon for webhooks to fail causing API server outages until the offending webhook is corrected or removed. Hydration happens off cluster and eliminates this issue.

Template code can misbehave just like webhook (or controller code) can misbehave. If it's absolutely crucial that certain config be correct, then it should be reviewed and stored in git instead of processed by a potentially-buggy webhook or mutated at hydration-time by a potentially-buggy template.

Err yes which is why it should be part of hydration off-cluster and with the results stored in git not done on-cluster?

When users are managing 100s or 1000s of clusters I see a strong desire from them to minimize the amount of per cluster customizations with individual manifests/oberlays/value files in favor of an approach that abstracts out these differences via some templating mechanism.

Non-deterministic templating is a cheap and easy way of reducing duplication. But the cost is "your GitOps tool now doesn't provide a strong GitOps experience." For some that's an acceptable tradeoff, but for many it isn't.

Somewhat agree that it is a tradeoff but I do see more and more folks who manage larger amounts of clusters want to make this tradeoff as managing large amounts of per cluster customizations can be burdensome. Other tools like Flux (and Red Hat's RHACM) do support this and I do see some users choosing to use those tools because of this.

Again appreciate your response and patience on this topic with someone that is not nearly as deep into it as you are :)

@crenshaw-dev
Copy link
Member Author

a relatively small list of static per cluster variables

Since it's a small list of static values, seems it shouldn't be too burdensome to just push them to git. But on the other hand, most of my concerns about dynamic hydration don't apply to small numbers of static values, so this use case seems pretty alright to me. And the hydrator as proposed will support this use case via CMPs.

The issue with controllers though is often times you need to to template/substitute on resources that are already managed by controllers and there is potential for race conditions and other issues to appear.

The controller solution is complex, for sure. And for static inputs, it's probably excessive. But for highly-dynamic values, I think the effort is worth it. If you want to provide the user with a really robust experience, writing a well-tested controller is probably better than relying on dynamic hydration and some pseudo-GitOps change application mechanism.

doing it in hydration means it gets stored in git where the changes can be easily seen and approved

It sounds like this workflow would work like this:

  1. propose change to DRY manifests, rendering a preview for the reviewer
  2. have Argo CD hydrate the manifests and push them to a staging branch
  3. open a PR from the staging branch to the live hydrated branch
  4. have someone review the new, hydrated PR
  5. merge to the live hydrated branch and sync

I want to avoid requiring the 2nd review. And you can skip the 2nd review if and only if you are certain that Argo CD's hydration will precisely match what you did when previewing the DRY change.

And if you're going to all that effort, you're probably better off just committing the locally-hydrated manifests into git and having them reviewed in one PR (I think Spotify does basically this with their custom hydration CLI).

I don't feel users wanting this are expecting Argo CD to monitor these dynamic sources of information

I'm much, much less concerned about dynamic lookups if 1) Argo CD doesn't have to actively monitor the sources or 2) if the "dynamic" values are basically static. Unfortunately, Argo CD has no way of differentiating a static lookup value from a volatile lookup value. So if we allow the former, we can expect people to do the latter (although to be fair, we can warn them about the downsides of doing so).

If for some reason the lookup would return something else it would be handled on the next hydration.

This is the UX I'd really like to discourage. Intuit has this problem with a new custom hydrator we've implemented. The hydrator's behavior changes without the user being in the loop. So a user might push a simple image tag update, and suddenly a bunch of other changes are shipped with the image tag bump. If something breaks, you have no immediate way of knowing which change caused the problem. And you can't simply revert the dry manifest change, because the hydrator will still produce the unrelated changes. You have to revert the hydrated change, which means the user is interacting with a component that isn't their normal write interface and should just be read-only interface for convenience.

I believe the strongest GitOps experience is one in which every change in the hydrated manifests corresponds to a user change in the dry manifests (even if that's just bumping a version number for the hydrator).

I'm not thinking of the use case where a user wamts to use a lookup on a secret that is rotated every 7 days for example. Is this scenario what is driving a lot of your concern here as I can understand your PoV on that one?

Yep, this is 100% the use case that worries me. And I'm not sure, from Argo's perspective, how to differentiate this volatile input from a more static input.

Also how is Intuit managing this for their large fleet of clusters, maybe it is not as much of a challenge with vanilla Kubernetes as it is with OpenShift?

We have two manifest management systems.

In the old system, we dump Kustomize manifests into a git repo with remote bases pointing to common components that don't change much between services. Many of the static values you mention are vended with the manifests, others are injected by webhooks.

In the new system, we dump a relatively small YAML file which is hydrated by a custom CLI. The static values are mostly all still vended. Other "best practices" changes are made via updates to the CLI (e.g. tweaking Rollout steps). The YAML doesn't include the CLI version number, so unexpected changes ship with user changes. It's a problematic system, and I'm advocating for a lot of changes to mitigate the "dynamic" (surprising and therefore rather bad) UX.

I'm not sure I'm following, basically if you hydrate the manifests in a separate git repo you can lint/test/review that new version in the rendered git repo before Argo CD deploys it onto the cluster.

I think the flow should be "propose DRY change -> review hydrated preview diff (plus lint/dry-run/etc.) -> merge -> Argo CD hydrates and deploys." That's only possible if hydration is deterministic, because otherwise the "hydrated preview" is invalid the moment it's generated. If the flow becomes "propose DRY change -> review -> merge -> generate hydrated PR -> review/lint/dry-run/etc.," I think we've introduced two much friction in the form of the two-step review, and that we'd be better off asking users to locally hydrate and commit hydrated manifests instead of DRY.

Of course 'the "hydrated preview" is invalid the moment it's generated' assumes volatile inputs, which I know is not the use case you're focusing on.

Other tools like Flux (and Red Hat's RHACM) do support this and I do see some users choosing to use those tools because of this.

I think there's a decent chance that the "deterministic hydration + webhooks/controllers" opinion ends up being an unpopular one. I do think that the benefits those opinions enable are worth giving a chance by backing them with more robust tooling than what exists today.

@crenshaw-dev
Copy link
Member Author

One potential "best of both worlds" solution is to enforce that the hydrated "preview" we generate for DRY PRs ends up being precisely the hydrated manifests we deploy. So even if some dynamic input changes while you're reviewing, you can be confident that when you hit "merge," what you see is what you'll get.

Unfortunately, the preview experience isn't part of this proposal, so I've limited this proposal to an experience that doesn't rely on a frozen preview to ensure a good UX.


Today, Argo CD watches one or more git repositories (configured in the `spec.source` or `spec.sources` field). When a new commit appears, Argo CD updates the desired state by rendering the manifests with the configured manifest hydration tool. If auto-sync is enabled, Argo CD applies the new manifests to the cluster.

With the introduction of this change, Argo CD will watch two revisions in the same git repository: the first is the "dry source", i.e. the git repo/revision where the un-rendered manifests reside, and the second is the "hydrated source," where the rendered manifests are places and retrieved for syncing to the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are everyone's thoughts on having the hydrated manifests optionally, live in a completely different repo than the "dry source"?

At least one benefit is sharding the "hydrated" manifests into dev, preprod, and prod repos. That allows platform teams to add additional quality gates and potentially even different Argo CD instances to watch the "hydrated" manifests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'll be a highly requested feature very quickly. It complicates the implementation quite a bit, which is why I've avoided it for the MVP. But I think it would be a super helpful feature.

Co-authored-by: joe miller <joeym@joeym.net>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants