-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Healthchecks resynced every ~2mins, causes consul index to grow #3259
Comments
A quick update: I don't know what was the root cause, but the problem was resolved after we downgraded to 0.7.5 |
@shays10 thanks for the update. This kind of thing is usually related to some changing output, but the downgrade fixing it doesn't really fit in with that explanation. Do you have ACLs enabled? |
@slackpad Thanks for your reply! No, ACLs are currently disabled in our cluster |
Hi @shays10
Did you downgrade the clients, servers, or both? |
Hey again! Both.
|
I've just faced the same issue with latest consul. client
server
|
No. https://github.com/hashicorp/consul/blob/master/agent/check.go#L400-L415 Have you got a timestamp in the body? |
Nope. |
@ponimas Can you provide the check definition and a full check request/response for testing? |
as far as I understand request is a simple GET request
|
@magiconair is there any news on this issue? |
consul 0.9.2 - same behaviour, is it possible that header Date may trigger index to grow?
|
I'm currently a bit sick and also preparing for HashiConf next week. Should be able to catch my breath after that. |
@magiconair is that gonna be fixed in 1.0? |
@ponimas If I can find the root cause then probably. However, usually this is triggered by the |
@ponimas does that |
@slackpad yup it changes. But it's header. |
@ponimas ah ok - I thought you were actually running that |
btw our indexes are still growing insanely fast w/out health check output. However consul watchers and consul-template do not trigger on them and in general they trigger only on changes - there should be some undocumented magic here, as I always thought that trigger event for such blocking queries is x-consul-index change. |
Might be related as well: #3712 |
Since commit 9685bdc, service tags are added to the health checks. Otherwise, when adding a service, tags are not added to its check. In updateSyncState, we compare the checks of the local agent with the checks of the catalog. It appears that the service tags are different (missing in one case), and so the check is synchronized. That increase the ModifyIndex periodically when nothing changes. Fixed it by adding serviceTags to the check. Note that the issue appeared in version 0.8.2. Looks related to hashicorp#3259.
Since commit 9685bdc, service tags are added to the health checks. Otherwise, when adding a service, tags are not added to its check. In updateSyncState, we compare the checks of the local agent with the checks of the catalog. It appears that the service tags are different (missing in one case), and so the check is synchronized. That increase the ModifyIndex periodically when nothing changes. Fixed it by adding serviceTags to the check. Note that the issue appeared in version 0.8.2. Looks related to hashicorp#3259.
Closing this as fixed by #3642 which looks to be the root cause of this. |
We've upgraded to consul 1.0.2 but we're seeing this issue with the nomad health check. Almost every 2 minutes it does a synced check of the
|
After a lot of debugging I found the issue (for my issue), and submitted a PR |
Hey,
This is very much related to #1244
I'm using blocking HTTP queries and I encountered a strange behavior where consul index is incremented even when healthchecks didn't change their status (or their output)
So I looked at the value of X-Consul-Index, searched for an offending healthcheck that had that exact same number (via /health/checks/any) and presumed that I'll find that the check state was changed.
But all I could find in the log is
https://gist.github.com/shays10/d0bdb43e0ae3664af1686aa139f18965
All checks are synced again, and this happens every 90-120 seconds. I'm guessing that it happens when antiEntropy kicks in and it finds that the service is synced but the checks are not.
I managed to find a correlation between the timestamp of the sync (according to the log) and the time where the ModifyIndex was bumped in all the checks.
Any idea on how to find the root cause? I'm having lots of checks on a lot of nodes and this causes the index to go wild.
I tried to make sure as much as I can that there isn't a flaky health check that fails randomly or returns a different output.
Client: 0.8.3
Server: 0.8.3
Client:
Server:
The text was updated successfully, but these errors were encountered: