-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argocd out of memory after upgrade #4298
Comments
What difference in resource usage do you see? |
@martinbeentjes memory mostley, reaches 6000Mi |
@jessesuen Ill send an image on sunday, dont have it with me right now |
@jessesuen I also see that the controller uses 14Gib at somepoint, is it considered normal behavior? |
Is the controller restarting because it's being OOM killed? If so, I would suggest trying to bump memory limits for v1.7 to see if v1.7 inherently needs more memory, or if there is really a leak in the controller. |
@jessesuen well i dont know if there was a change in looping in non sync apps |
Me and @erezo9 are in the same team.
|
Reopening issue until the fix is released and tested. |
@alexmt |
@alexmt Thanks a lot for the quick fix, we installed it, it seems that the memory usage is slightly less, but there are still OOM restarts. It looks like the deletion of apps behaves better now. Regarding the overall performance and the sync waves issue, I can't say enough yet, have to test it further and will update |
@jessesuen upgrading to 1.7.6, same issue as well - wanted to update |
We have installed 1.7.6 and still the umbrella app is stuck upon sync even
after manual terminate
…On Wed, Sep 16, 2020, 20:34 Alexander Matyushentsev < ***@***.***> wrote:
Sorry for the troubles this upgrade cause @reggie-k
<https://github.com/reggie-k> , @erezo9 <https://github.com/erezo9> . The
fix delivered in 1.7.5 should solve memory spike during controller
initialization. Keep trying to OOM restarts reason and testing sync issues
with app-of-apps. Will update as soon as I find something
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4298 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVDWBC3JGK5TKGA7WQ5VDLSGDZKTANCNFSM4RE4RFHQ>
.
|
Mentioned to sync the umbrella app, the controller entered crash loop back
state for an unknown reason, after deletion of the pod and successful
startup the app managed to sync.
…On Mon, Sep 21, 2020, 21:39 Erez Tamam ***@***.***> wrote:
@jessesuen <https://github.com/jessesuen> upgrading to 1.7.6, same issue
as well - wanted to update
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4298 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVDWBHZTMDPG6FSBWBLHJ3SG6MWXANCNFSM4RE4RFHQ>
.
|
Hello @reggie-k , probably there are two separate issues. Can please share more Does your umbrella app manages itself? There is a known chicken and egg problem. During syncing Argo CD waits when the umbrella became healthy what cannot happen until all resources are synced. #3781 If this is the case you should see Regarding crash loop state. Can you please attach logs? |
@alexmt On the node the controller runs on we see that: This container is the argocd controller. |
@reggie-k , controller creates resource manifest file and kubeconfig to execute In this case the controller should backoff and notify about the issue but this is not implemented yet. |
Hmmm we suspected something of that kind and resolved the vast majority of out-of-sync apps after the upgrade to 1.6. But I will double check on Tuesday whethere some remained that way. It's now holiday time here. Yes, all of our apps are auto sync. |
The potential fix got merged into master: #4434 I'm going to run it internally for few days, just in case, and will release in 1.7.7 |
Thanks a lot, Alex!
…On Sat, Sep 26, 2020, 03:11 Alexander Matyushentsev < ***@***.***> wrote:
The potential fix got merged into master: #4434
<#4434>
I'm going to run it internally for few days, just in case, and will
release in 1.7.7
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4298 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEVDWBAZE43GPCF5OSBVQBDSHUWRJANCNFSM4RE4RFHQ>
.
|
I am making all the out-of-sync apps sync. |
All the apps are now synced (some are degraded or progressing but as I understand this is not a problem with regard to the controller restarts and slow sync). |
Closing, but please file a new issue if you see further problems |
Hello, Do you think it should have been fixed or it could be the same issue ? |
@eddycharly , we are running into the same issue, using the latest helm chart with argo v2.0.5. So far, our fix has been to terminate the pods and spawn argo pods (which should be automatic due to the RS), which is a hacky way around it, but for anyone who urgently needs to get it up and running. |
If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a
question in argocd slack channel.
Checklist:
argocd version
.Describe the bug
After upgrading to 1.7.4 from 1.6.2 we experience alot of oom on the application controller and it does a restart alot, event after increaseing
To Reproduce
We have over than 200 applications and one controller
Expected behavior
Be the same as version 1.6.2 when we didnt experience this issue
Screenshots
If applicable, add screenshots to help explain your problem.
Version
Paste the output from `argocd version` here.
unfortunately i cannot since confidetnial
The text was updated successfully, but these errors were encountered: