-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached git cloner #3655
base: master
Are you sure you want to change the base?
Cached git cloner #3655
Conversation
Welcome @maknihamdi! |
Hi @maknihamdi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
77eaaea
to
d6e63b4
Compare
74d717a
to
308379d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying this.
This PR completely replaces how kustomize works with git, and introduces new set of cache mgmt issues.
-
The code being replaced isn't large, but it has undergone many changes by many users (see history), replacing it will likely hold surprises in some use cases.
-
There's no flag to allowing a user to try the new method and fall back to the old method. This has to be behind some flag, e.g. --enable_alpha_git_cache.
-
Caches always start as a performance enhancement, then become a source of bugs around cache expiration, cleaning, etc. If kustomuize is going to maintain a cache, it should be able to list, refresh and purge that cache (new commands).
-
This PR adds an undocumented
.kustomize
directory, and it doesn't use or honor$XDG_CONFIG_HOME/kustomize
. Any artifacts created by kustomize should live there (so far it's only plugins).
See https://kubectl.docs.kubernetes.io/guides/extending_kustomize/#placement. -
finally, it's likely that the popular go-getter from hashicorp is going to come back into use in some injectable form that doesn't increase the director or transitive dependencies of kustomize/api. That's not enough to stop a caching proposal, but it's all going to have to work together somehow.
Finally. a meta question:
Does it make sense to do this?
Do we want kustomize to be maintaining repository caches - as opposed, to - say - some wrapper ci/cd system handling this?
A flag would at least allow people to experiment. This is a big change.
thank you @monopole for your review! It's a good idea to introduce a --enable_alpha_git_cache flag to let people experiment. It's what we do with my team, we started with a very basic implementation, and we are improving the process to have something usable with many use cases. I think we can introduce an extra flag to accept or not a mutable dependency (branches ref). For example, in our CI/CD we prohibit using branches reference in production environments, but we need it in development env We can also introduce expiration management, with an automatic cleanup or manual assist, it can be a new kustomize command (or flag) to clean cache directory. I got inspired by kubectl cache directory to choose the $home/.kustomize (I should document it by the way), but we can change it to use the XDG_CONFIG_HOME, but it's not a plugin, and It's not a conf, I dont know if XDG_CONFIG_HOME is the good place. What do you think? "Does it make sense to do this?": we really saves a lot of time in development and ci cd jobs since we use this functionality with my team. I'll explain how and why we do it in the issue #2460 |
what about XDG_CACHE_HOME to store the cache? It s maybe more compliant as XDG_CONFIG_HOME |
We should schedule time to discuss this at a sig-cli meeting. |
Agree XDG_CACHE_HOME should be honored. It should default to Could put that definition near https://github.com/kubernetes-sigs/kustomize/blob/master/api/konfig/general.go#L29 Then kustomize's git cache would be in, say, |
We would really like to see this get in as well. We are struggling with this same problem where we version components against a remote base repo and very long build times. |
308379d
to
44fea5c
Compare
Please rebase and squash so we can see where this is. |
The idea to add an allow branch refs flags is good and consistent with flags having no impact on kustomize output. |
44fea5c
to
a801599
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@maknihamdi are you still planning on working on this? If so I think it could be worth a KEP, or at least detailed issue so that a suitable strategy could be made. @george-angel - Are you reopening because you would like to see this work merged? |
We would also like to see this work merged, it would go a long way to speeding up our kustomize builds. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale @cailynse Correct - very keen to see this implemented. |
@sarab97 @natasha41575 I don't think As @sarab97 alluded to in his Slack message, the The cached git cloner, on the other hand, operates at the remote root aka repository level. This means that Other small differences between this cached git cloner and
|
I've been thinking about this PR a lot, and I want to start off by saying that I agree that we should try to improve the build times for remote builds. However, this approach violates the fundamental kustomize design philosophy of "no build time side effects", which you can read about here. I will do my best to explain why, and offer some other approaches that might resolve the issue. Kustomize is designed in such a way that a kustomization directory is supposed to have all the information it needs for the build. This means that it shouldn't read from the environment, and it shouldn't read from files outside the kustomization directory. This keeps a kustomization directory simple, consistent, and portable, e.g. the output of Beyond violating the build time side effects, I think this also complicates the workflow as it leaves it up to the user to refresh the cache and it is not immediately obvious what kustomize is using as the source of truth in its build, making the tool less intuitive. I believe there are better solutions here that can equally resolve the issue or at least come close. Some potential ideas might be: a) To improve the build time within a single build, we can consider caching the remote directory (i.e. what this PR does), except that we need to clean up the cached directory before finishing the build. That means that the cached directory cannot be shared across builds. I would also vote for it to be in a hardcoded location, rather than using an environment variable. I'm happy to keep discussing and brainstorming other solutions, but I want to emphasize that we have to keep the original kustomize design principals in mind. I know this can be frustrating as it seems to block important improvements, but I think it is really important that we stick to these principles. One of the old maintainers wrote up a wonderful writeup which I will paste here that really explains well why we need to stick to them:
|
Thanks @natasha41575
Am I wrong in thinking that I do like option (a) - that goes a long way to solve our issues of monorepo bases that we reference many times during a single build. My very naive view was - to keep git cloned repos in Option (b) is fine, but it would need our input to make use of it. We are currently running our own CD - https://github.com/utilitywarehouse/kube-applier/ and looking to migrate to ArgoCD, where we have to do Kustomize builds via a plugin: https://github.com/utilitywarehouse/argocd-voodoobox-plugin . So the logic described has to be implemented there. And for other orgs - they would rely on ArgoCD, Flux, etc. to make use the new localize flow. This is not a problem, but certainly not as easy as "if your kustomize build has 100 references of the same branch of a remote repo, it will only be cloned once". Thank you! |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
To improve kustomize build time, I add a new gitcloner function: CachedGitCloner.
This function uses a home directory to cache "visited" repositories.
In the same build cycle we can reuse an already cloned repository with the same ref.
We also can reuse the same directory between builds.
User can clean manually his cache
The CachedGitCloner check if the git ref is a tag or a branch, the command fails if no tag are used.
We can discuss about and improve this check. The check can be optional, using a build flag, we decide to be strict or flexible with git ref. (this is not implemented, the only strict mode is implemented)
this PR can fix #2460