Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(proposal): GitOps Promotion/Rollback Tool #17506

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

crenshaw-dev
Copy link
Member

This is a very rough work in progress, but hopefully enough to get the basic idea.

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
crenshaw-dev and others added 2 commits March 14, 2024 11:37
Co-authored-by: Zach Aller <zachaller@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Zach Aller <zachaller@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Every change is represented as a commit on a "DRY" branch in a git repo. The contents of the branch depend completely on the manifest hydration tool. For example, Kustomize may include kustomization.yaml and a few related manifest files; Helm may include values.yaml and supporting chart files; and so on.

#### Environment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally advise against using branchers for environments, and instead recommend using folders. I would hate to enshrine branches in this way. I think you're probably familiar with the arguments for why that is but here's a refresher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to clarify this section a bit... definitely agree "directories for environments" with respect to dry manifests (e.g. your kustomize bases and overlays). But a separate branch for each environment's hydrated manifests. So the user's "write" interface is directories. The "read" interface is branches.

It also has the nice side-benefit of letting you use SCMs' "compare" interface to compare environments' hydrated manifests. And you get a full history of the hydrated manifests for just one environment by looking at the branch's commits.

Copy link
Member Author

@crenshaw-dev crenshaw-dev Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the blog post:

If you follow a branch-based approach for your environments:

  1. You need to have 13 long living Git branches at all times.
  2. You need 13 pull requests for promoting a single change across all environments.
  3. You have a two dimensional promotion matrix with 5 steps upwards and 2-3 steps outwards.
  4. The possibilities for wrong merges, configuration drift and ad-hoc changes is now non-trivial across all environment combinations.

I think these all apply to a system where the primary expression of the environments is in branches and where the user has to interact directly with those branches. The proposal would keep the primary expression of the manifests in one branch's directories and automate all the interactions with branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to think of this as using branches as "cheap" artifact storage for hydrated manifests. Theoretically, they could also be pushed to something like an OCI registry, but having them on branches also puts them closer to the developer as they are interacting with the repository anyway. A similar concept is described in this blog post on the rendered manifest pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Cheap and often has an awesome UI "for free."

name:
# These fields roughly correspond to the fields available in the GitHub Checks API and the
# Gitlab Commit "pipeline status" API.
sha: xyz321 # Corresponds to the hydrated SHA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a label as well to help the selection ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I think it might change quite often. Custom informer index on the user side might be the better option.

I've added a section explaining the mutability of this and the repoRef field.

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…argo-cd into promotion-proposal

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
crenshaw-dev and others added 4 commits March 26, 2024 12:32
Co-authored-by: Oded Ben Ozer <oded.benozer@gmail.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
targetRevision: main
# This assumes the Application's environments are modeled as directories.
path: environments/e2e
#chart: my-chart # if it’s a Helm chart
Copy link
Contributor

@OpenGuidou OpenGuidou Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also put examples of values here, if we want to hydrate only an environment to a dedicated branch

helm:
  valueFiles:
     $gitRepo/environments/e2e/values-myenv.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think multi-source apps will get their own section, since they're a bit of an Achilles heel for this feature working 100% nicely. But I think we can come up with a reasonable solution.

metadata:
name: example
spec:
# The sourceHydrator field is mutually-exclusive with `source` and with `sources`. If this field is configured, we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not necessarily exclude both, as it would be useful in a multi-sources pattern to have anchors to be used in the helm values used for hydration (see comment below) (https://argo-cd.readthedocs.io/en/stable/user-guide/multiple_sources/#helm-value-files-from-external-git-repository)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely want to support this feature. What I mean by "mutually-exclusive" here is that, if you have multiple sources under sources and want to start using the manifest hydrator, you'll need to move the contents of the sources field under sourceHydrator.sources. It doesn't make sense to have both fields populated.

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

## Open Questions

* The `sourceHydrator` field is mutually exclusive with the `source` and the `sources` field. Should we throw an error if they're both configured, or should we just pick one and ignore the others?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an opportunity to using something like CEL here? My opinion, throw an error if they're both configured 😄.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define CEL for me? But yep, I generally feel like an error message is better.

Copy link
Contributor

@zachaller zachaller Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/reference/using-api/cel/

This would require ArgoCD to support openapiv3 pretty sure in order to use it with k8s

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CEL in the CRD spec should allow the k8s API server to reject objects when both fields are set, as an alternative to accepting and persisting the objects and only then throwing the error in ArgoCD controller log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think CEL would be out of scope for this proposal since it would presumably be a fairly significant effort to get Argo CD to OpenAPI v3. But it's definitely much nicer than a runtime error!


This proposal introduces a system of push-by-pr-merge. Instead of representing the change promotion state in some external system, it is represented as PRs against environment branches containing "hydrated" (or "rendered") manifests. The "wait" stages of the promotion process (performance tests, manual approvals, etc.) are represented as commit statuses blocking PR merges. The state of the entire system is visible in the git/SCM interface. This system is generalizable to many organizations, relying on near-ubiquitous SCM concepts, allowing the promotion tool to be open-sourced.

![Diagram showing the promote-by-pr-merge strategy. Changes originate as PRs against the main (DRY) branch. A tool then generates PRs for each environment with the hydrated manifests. The tool merges the PRs in the designated order. Changes are immediately synced out to the cluster on PR merge.](images/promote-by-pr-merge.png)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing PR generation automation at a large enterprise, we've found that the GitHub rate limits for creating PRs are quite low. These rate limits aren't disclosed by GitHub and the documentation states: These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons. We've also found that the rate limits are higher for GitHub apps than GitHub service accounts, but these limits can also be easily exceeded with the rendered manifests pattern and hundreds of users. For that reason, I don't recommend designing a system that relies on automatically raising PRs.

One way you can work around this limit is to have your automation generate a GitHub link that a human can click to generate a PR. Since the rate limit applies to a user, you are highly unlikely to exceed the rate limit that way.

Another approach is to never use PRs for your rendered code, but rather generate feature branches and push them and merge them with your automation. Any reviews happen on PRs raised in the interface repository and only a human can raise a PR. And even in the interface repository, if a change needs to be automatically made and doesn't require approval, no PR is made, only a feature branch. This is the system we have implemented.

Copy link
Contributor

@OpenGuidou OpenGuidou Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something we've encountered as well with renovate. And it's true that the conclusion was the same: if you need an automatic merge, probably a direct commit from the feature branch (where the checks happen) would be enough.
However, I would not go for a design requiring a human action in an automated flow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early on we discussed a mode which relies on merging without opening PRs (i.e. direct commit from a feature branch). I think for now we for simplicity we should start with PRs, find the rate-limits of that systems and any available workarounds, and then look into building out a direct-merge alternative.

The problem with direct-merge is that you lose the nice, familiar, visible interface. As the Argo CD UI evolves, we might find that it's sufficient and that the PR UI is redundant and unnecessary.

---
```

Each commit to the dry branch will result in a commit to up to three branches. Each commit to an environment branch will contain changes for west, east, or both (depending on which is affected). Changes originating from a single dry commit are always grouped into a single hydrated commit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing the rendered manifest pattern, we have found that the diff of the rendered manifests is quite noisy and our users have found that their intended change is lost in the noise. As a result, they often don't review the rendered manifests at all. In order to solve this problem, we publish a shortened diff in the comment of the PR to the interface repository. That diff intentionally omits certain boilerplate changes, in particular automatically generated metadata (for example, I link back to the PR that the automation adds as an annotation, config map name change, etc.). Users still have access to the full diff, however, available under a link automatically added to the PR. We have found that reviewing the dry change + shortened hydrated change is a better user experience than reviewing the entire noisy change to the rendered manifests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... do you know of a general way to simplify diffs? Or does it have to be org-specific?

I generally believe that writing org-specific CRDs+controllers is the best way to hide unnecessary detail from devs. But in practice I know that won't always be practical.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment our exceptions are organization-specific, but they wouldn't be if Argo added similar annotations automatically. The annotations we add to the rollout and to the pre-sync and post-sync jobs are:

  1. The git hash of the change in the interface repository. This is useful for identifying a given deployment and for logging from pre-sync and post-sync jobs.
  2. The GitHub user name of the person who raised the PR.
  3. A serial number that always increases with each merge to the rendered-manifests. This is useful for a synchronization barrier between rollouts running in different clusters.
  4. A link to the pull request (or the commit if no pull request was used). This is useful to easily navigate to the PR from the Argo UI.
  5. The name of the config map – in our system we automatically rename the config maps whenever they change. We also save a number of old config maps in git, not just the most recent one. This solves the problem that Argo Rollouts doesn't revert changes to the config map when the rollout aborts. Note that our solution with renaming config maps is not the only possible solution to that problem – for instance, flux copies the config map before starting the canary pod and loads the canary from the copy. I believe that at some point it would be useful to solve the config map problem in Argo Rollouts as well and the chosen solution affects this list of noisy changes on each rendering.

Copy link
Contributor

@OpenGuidou OpenGuidou Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the configmap workaround for rollouts. But you could handle that with a

argocd.argoproj.io/sync-options: PruneLast=true

On the configmap no ? That's what I do on my side

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like 1-4 would be great for basically any org. I'm not sure whether those should be part of this proposal... I tend towards saying "probably not."

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

Consider a PromotionStrategy covering dev, test, and prod. If the test environment has an open PR against the hydrated branch (pending change), and someone pushes another commit to the main (DRY) branch, the PR will be updated to include the new changes, and the PR will wait for the dev environment to finish deploying and verifying that new commit.

It's possible that a high-churn low environment could block merges on higher environments. So far, we think that's an acceptable tradeoff. Our recommendation will be that teams move as much of the change validation as possible to the pre-commit stage so that changes can be promoted relatively quickly through environments, and high-churn low environments won't block promotions to higher environments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for the proposed system to be universally accepted by the community, please consider that many organizations test changes in one environment for arbitrarily long before promoting to the next one.

Let's consider the following scenario. As far as I understand the proposal, the change to the dry branch is reviewed, approved and merged first. The change to the hydrated dev branch is then merged automatically as well, but the PRs to the hydrated test and prod branches remain open. After some time the user merges the PR to the hydrated test branch. After another few days the software is ready to be approved and merged to prod. But how many days have passed since reviewing the dry branch? In this hypothetical scenario it took 7 days. So the hydrated change to prod is being reviewed 7 days after the equivalent change to the dry branch. I find this odd. I believe the dry and hydrated changes should be approved by the same person at the same time. Especially that the dry change to one environment may not be the same as the dry change to another environment (for example, consider a change to an URL that is different in each environment).

This is a fundamental concern I have with this proposal – the reviews, approvals and merges to the dry branch and the hydrated branches occur at completely different times. The person who approved the dry branch may not even be at work at the time the hydrated branches are being reviewed.

I find it confusing for other reasons as well: if I'm browsing the history of the changes to the dry branch, the merge time is divorced from the deployment request. That breaks one of the promises of GitOps: that git log roughly corresponds to the history of deployment requests.

I appreciate this proposal, however, especially the idea to make the PR the primary interface for promotions. So let me step back and consider what kind of reviews/approvals an organization may require between promotions to different environments:

  1. An organization may require that the review and approval happens at the time of deployment to a given environment. For example, there may be a review and approval before deployment to test and another review/approval before deployment to staging. In this workflow, the user can have an arbitrarily long delay between deployments to test and staging.
  2. An organization may require that the review and approval happens at the time of the first deployment in a sequence of environments. For example, deployment to staging may require a review and approval of changes to both staging and prod, because developers should only be allowed to deploy to staging when they are ready to go to prod as well. In this workflow the software gets promoted from staging to prod within a couple of hours from deployment to staging.

Both of the workflows above are valid. So how can reviews and approvals look like in each case? Let me propose a different solution to the one you are proposing, but one that adheres to the same fundamental idea: that git and pull requests are the user interface.

In the first scenario the entire dry configuration for test and staging would be kept in separate directories in the dry branch, so that deploying to each environment involves a separate PR to the dry branch (one PR to the test directory of the dry branch and one PR to the staging directory of the dry branch). There is no need to have a PR to the rendered branch – the automation can just create a simple feature branch to be merged to the rendered branch (this also solves the PR rate limiting problem). Each PR can then be reviewed together with the corresponding rendered change at the same time as the dry change and by the same person. After merging the test PR, the user can wait arbitrarily long to raise the staging PR. The link to raise the staging PR is automatically generated and added as a comment to the test PR, so that the user can easily execute the promotion.

In the second scenario, the dry state of staging and prod should be kept in a single directory, i.e. using shared files (with the exception of some environment-specific overrides). That way a single PR produces rendered changes to both environments, which can be reviewed again at the same time as the dry change and by the same person. However, merging the PR only triggers deployment to staging. You might ask: how do you then approve promotions from staging to prod? By a special comment to the PR (such as /promote). The prod sync would wait until the user adds such a comment (this can be implemented in a number of ways, but I'll leave that discussion for another thread).

The advantages of the approaches above is that they don't separate out the reviews of the dry and hydrated manifests into steps that are done separately and a long time apart.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses to concerns

I believe the dry and hydrated changes should be approved by the same person at the same time.

100% agree. I think this proposal only makes sense when coupled with an automated diff bot which comments all hydrated manifest changes on the dry PR for simultaneous review. That piece of the puzzle is discussed in the ArgoCon talk but isn't yet sufficiently represented in this proposal.

the reviews, approvals and merges to the dry branch and the hydrated branches occur at completely different times. The person who approved the dry branch may not even be at work at the time the hydrated branches are being reviewed.

I think we could assume that, for many environments, there will be no direct human review applied to the hydrated branch PR. We'd assume that the review was already done as part of the dry PR review. Of course, a user may choose to require a human review on the hydrated PR as an extra safety check. But I don't think that would be the default mode.

if I'm browsing the history of the changes to the dry branch, the merge time is divorced from the deployment request

I believe we would try to make it clear that the dry commit history is a history of the developers' intent and the hydrated branch commit histories are the histories of the environments' states. The hydrated branch commits will be tied back to the dry commit history via README files and metadata files which contain auto-generated references to the associated dry commits.

Responses to alternative proposals

Proposal 1: developer-controlled wait between promotions

I'll answer the first proposal in two parts:

Part 1: two dry PRs

Let me propose a different solution... the entire dry configuration for test and staging would be kept in separate directories in the dry branch, so that deploying to each environment involves a separate PR to the dry branch

I think the proposed system already accommodates this flow. If the developer wants to explicitly push only to stage, wait for a time, and then only push to prod, they can use the proposed system. After merging the first PR, only one promotion (hydrated) PR will be opened: one for the affected stage branch. After some time, the developer can open a second PR affecting only the prod environment, merge it, and a hydrated PR will be automatically opened/merged for the prod environment. Of course, making dry changes that affect all environments and then relying on automation to promote that change will be the preferred flow. But there is no technical limitation preventing them from taking a more manual approach. (The question of whether hydrated changes are represented as PRs or as direct commits is discussed in a separate thread, so I'll leave that discussion there.)

Part 2: auto-generating PR links

The link to raise the staging PR is automatically generated and added as a comment to the test PR, so that the user can easily execute the promotion.

The link would open a PR against the stage directory of the dry branch or the stage hydrated branch? If the former, how would the system know what the source branch of the dry change is? If the latter, I feel like that's effectively the same as requiring a manual review on the auto-generated stage promotion PR against the hydrated branch.

Proposal 2: /promote-based promotion PRs

I'm not sure how requiring a /promote comment on the dry PR is effectively different from requiring a manual approval on the hydrated PR for the target environment (via branch protection rules).

The advantages of the approaches above is that they don't separate out the reviews of the dry and hydrated manifests into steps that are done separately and a long time apart.

I think both of the proposals above assume there is no automated diff presented to the developer on the dry PR. If I'm correct, we might not need to dive into the finer points of the alternative proposals, because your concerns may be resolved by the existence of the diff bot. :-)

Copy link

@akorzynski akorzynski Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough response :-) My comments are inline:

Proposal 1: developer-controlled wait between promotions
Part 2: auto-generating PR links
The link to raise the staging PR is automatically generated and added as a comment to the test PR, so that the user can easily execute the promotion.

The link would open a PR against the stage directory of the dry branch or the stage hydrated branch? If the former, how would the system know what the source branch of the dry change is? If the latter, I feel like that's effectively the same as requiring a manual review on the auto-generated stage promotion PR against the hydrated branch.

It's the former – the link would open a PR against the stage directory of the dry branch. The system operates on the PR to the test environment, so it knows the branch associated with the PR.

There are actually some nuances here. There are two use cases:

  1. The developer deploys a number of changes to test, as they iteratively develop the change. So a number of PRs have already been deployed to test. Then the developer wants to deploy all of those changes to staging, so they effectively want to copy the current state of test to staging.
  2. The developer is testing the newer version of the software in test, while production is running an older version. However, in the meantime, the developer needs to change the configuration in all environments: for example, they need to change a URL in the config map. This URL needs to change independently from the container image version running in each environment. So the developer raises a PR to test (the PR contains only the change to the URL) and effectively wants to apply the same patch to staging without applying all the other changes from test to staging.

We have been discussing use case 1, but use case 2 requires additional discussion.

Back when people used different branches for each environment, they could use the git cherry-pick command to perform the promotion in use case 2. But nowadays we use directories per environment (for good reasons), so we need an equivalent operation.

This is why I'm thinking about producing two links per environment for the user: one that raises a PR with the copy/replace semantics and the other with the cherry-pick semantics. The automation would retrieve the list of available environments, push two feature branches per environment and generate links in the form of:
https://github.com/my_org/gitops-dry/pull/new/$BRANCH

Proposal 2: /promote-based promotion PRs

I'm not sure how requiring a /promote comment on the dry PR is effectively different from requiring a manual approval on the hydrated PR for the target environment (via branch protection rules).

You are correct that the workflow is practically the same, but there are some implementational differences:

  1. I believe it's a little confusing that the action of promoting from staging to prod involves approving a PR, because, as we previously discussed, the review of that PR actually gets done earlier in the process.
  2. At the organization I'm currently contracting to, the review of the dry and rendered manifests is done by a developer, whereas the decision to go from staging to prod is made by a QA engineer. These are different people and I would rather avoid the confusion that the QA engineer supposedly reviews the rendered manifests, as this is how it's going to look like in the GitHub PR record.
  3. There is also the question of access control: the QA engineer should only be granted the right to push a button to approve the promotion from staging to prod. Granting them the rights to approve rendered branches is going to look confusing and this is particularly important for organizations in regulated environments, which have to explain to auditors why certain user groups have certain rights. On the other hand, the access control of who can use the /promote comment can be implemented in the action that processes the comment and can be limited to that action only.
  4. As discussed previously, using /promote avoids the PR creation rate limit problem. I'm not concerned about exceeding the rate limit when creating the PRs to the dry branch, because they are typically raised by humans and the limit applies to a user. It's creating PRs in the rendered branch that has the potential to exceed the limit, because these PRs would get raised by a bot.

Finally, I'd like to note that there are potentially other ideas than /promote. @Oded-B suggested to me sometime ago that it might be possible to use Markdown checklists:

  • promote to prod

Checking the box is an event that can trigger an action. However, I haven't tried this method myself and I haven't tested if the event contains the username of the user who checked/unchecked the box and if the action of checking and unchecking the box is properly recorded for auditing purposes. This would require some more investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so they effectively want to copy the current state of test to staging.

How does the promotion system know what to copy in the dry branch? Consider the example change "convert Deployment to Argo Rollouts." For dev, you might want a very simple Rollout with just a couple steps. For test, you might want something slightly more cautious. And then for prod, you might have a very cautious Rollout configuration with many steps. The promotion system has no concept of a "change" that it can map to the dry manifests, so it cannot function by copying yaml from one environment directory to the other. It's up to the developer to express their "change" as yaml in all environment directories (or, if they choose, one at a time for extra control), and then it's up to the promotion system to deploy those changes according to the defined promotion rules. I'm not sure how we could implement a generalizable promotion system based on copying dry YAML from one environment directory to the other. (Codefresh's promotion system uses what amount to YAML "masks," implemented as json path expressions; I think that system is fraught with edge cases.)

I believe it's a little confusing that the action of promoting from staging to prod involves approving a PR

It doesn't have to involve approving a PR. You could implement a custom commit status check requiring that all prod promotions must be approved in some internal QA system. The status of that check would appear on the PR in the list of checks, and you could could add a "details" link out to your QA approval system for discoverability.

These are different people and I would rather avoid the confusion that the QA engineer supposedly reviews the rendered manifests, as this is how it's going to look like in the GitHub PR record.

Understandable. I think the commit status checks-based solution resolves that concern.

using /promote avoids the PR creation rate limit problem

Personally I think direct-commit is a better way to avoid the rate limit issue. But one could still use a /promote mechanism on rendered manifests instead of dry manifests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Sorry for delay in my response. I sometimes require a few days before I can respond due to other responsibilities, but rest assured I'm not abandoning this important discussion :-)

so they effectively want to copy the current state of test to staging.

How does the promotion system know what to copy in the dry branch?

That's a good question. The system I'm describing requires some assumptions about the underlying directory structure of the GitOps repository. We have a certain structure in our system that allows the promotion logic to know how to copy data to promote from one environment to another. I would have to go into more detail to describe the directory structure and such detail doesn't really belong into this thread, so let me just make a general comment: I believe it's OK for the promotion system to be opinionated about the directory structure and require that the user adheres to it. I believe the advantages of having a promotion system outweigh the disadvantages associated with requiring a particular directory structure, provided that the structure is sufficiently general.

For dev, you might want a very simple Rollout with just a couple steps. For test, you might want something slightly more cautious. And then for prod, you might have a very cautious Rollout configuration with many steps.

In our system this is implemented through environment-level overrides or if/else statements in the templates.

I believe it's a little confusing that the action of promoting from staging to prod involves approving a PR

It doesn't have to involve approving a PR. You could implement a custom commit status check requiring that all prod promotions must be approved in some internal QA system. The status of that check would appear on the PR in the list of checks, and you could could add a "details" link out to your QA approval system for discoverability.

Gotcha. However, I believe it would be a better UX if users interacted with the system on the PR, rather than having to log into a different internal QA system to approve the promotion.

But one could still use a /promote mechanism on rendered manifests instead of dry manifests.

I believe it would be a better UX if users didn't have to open another webpage (with the rendered PR), but rather could interact with the system on one webpage, i.e. on the PR to the dry manifests.

Let me make another proposal based on some experimentation I've done this week. What if users saw a comment like the one below on a PR?

-----------------

Promotion Control Panel

Use this to automatically raise Pull Requests.

Overwrite other environments

This will copy the configuration and image version from the test environment on this PR's akorzy-pl-patch-24 feature branch and overwrite the staging environment with that data.

flowchart TD;
    test("`test
_(on#160;feature#160;branch)_`")
    test -->|🔗 <a href='https://google.com'>PROMOTE</a>| staging("`staging`")
    staging --> prod("`prod`")
Loading

-----------------

How does it work? GitHub supports displaying diagrams in comments to the PR and also in other places. This is done through the mermaid extension. Importantly, you can add links in a mermaid diagram. The link could take you to an external service, which would then display a loading page, start a task, wait until the URL for tracking the progress of the task is available and then use JavaScript to redirect the user to that URL.

You can also have a control panel like above in the README.md file in a directory in your repository – GitHub will render the chart when the user browses the directory.

What do you think?

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

Today, Argo CD watches one or more git repositories (configured in the `spec.source` or `spec.sources` field). When a new commit appears, Argo CD updates the desired state by rendering the manifests with the configured manifest hydration tool. If auto-sync is enabled, Argo CD applies the new manifests to the cluster.

With the introduction of this change, Argo CD will watch two revisions in the same git repository: the first is the "dry source", i.e. the git repo/revision where the un-rendered manifests reside, and the second is the "hydrated source," where the rendered manifests are places and retrieved for syncing to the cluster.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With the introduction of this change, Argo CD will watch two revisions in the same git repository: the first is the "dry source", i.e. the git repo/revision where the un-rendered manifests reside, and the second is the "hydrated source," where the rendered manifests are places and retrieved for syncing to the cluster.
With the introduction of this change, Argo CD will watch two revisions in the same git repository: the first is the "dry source", i.e. the git repo/revision where the un-rendered manifests reside, and the second is the "hydrated source," where the rendered manifests are placed and retrieved for syncing to the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants