Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-Consul-Knownleader: false in dev mode #10945

Closed
dzharikhin opened this issue Aug 27, 2021 · 3 comments
Closed

X-Consul-Knownleader: false in dev mode #10945

dzharikhin opened this issue Aug 27, 2021 · 3 comments
Assignees
Labels
type/bug Feature does not function as expected

Comments

@dzharikhin
Copy link

dzharikhin commented Aug 27, 2021

Overview of the Issue

In dev mode request to /agent/check/fail/:check_id returns header X-Consul-Knownleader: false in response

In practice unit-test in java consul-client started to fail on the latest consul docker image

But it successfully passes on tag consul:1.9 - header in the test has true value
In the release notes I found nothing suspicious since v1.9
So maybe bug

Reproduction Steps

Just clone the repo, get java and run the test
Line with failed request: https://github.com/rickfast/consul-client/blob/master/src/itest/java/com/orbitz/consul/cache/ServiceHealthCacheITest.java#L43

You can change consul options in https://github.com/rickfast/consul-client/blob/cb738bff54032ed1303219bcd1e7aeca89a56f49/src/itest/java/com/orbitz/consul/BaseIntegrationTest.java#L22-L25

Log Fragments

server logs seem ok to me

2021-08-27T15:24:38.621Z [DEBUG] agent.http: Request finished: method=PUT url=/v1/agent/check/fail/service:30a3d660-0303-44dd-98fc-64812f250a07 from=172.17.0.1:64562 latency=858.4µs
2021-08-27T15:24:38.623Z [TRACE] agent.grpc-api.subscription: new subscription: dc=dc1 key=ab870c53-b62d-4e9b-919a-53152bf2d6d1 namespace= request_index=0 stream_id=ed2e15c0-a434-b206-b8fc-78eea4781ed1 topic=ServiceHealth
2021-08-27T15:24:38.623Z [TRACE] agent.grpc-api.subscription: sending events: dc=dc1 key=ab870c53-b62d-4e9b-919a-53152bf2d6d1 namespace= request_index=0 stream_id=ed2e15c0-a434-b206-b8fc-78eea4781ed1 topic=ServiceHealth index=17 sent=0 batch_size=1
2021-08-27T15:24:38.624Z [TRACE] agent.grpc-api.subscription: snapshot complete: dc=dc1 key=ab870c53-b62d-4e9b-919a-53152bf2d6d1 namespace= request_index=0 stream_id=ed2e15c0-a434-b206-b8fc-78eea4781ed1 topic=ServiceHealth index=17 sent=1
2021-08-27T15:24:38.625Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ab870c53-b62d-4e9b-919a-53152bf2d6d1?index=15&wait=1s&passing=true from=172.17.0.1:64566 latency=3.1301ms
2021-08-27T15:24:58.588Z [WARN]  agent: Check missed TTL, is now critical: check=service:30a3d660-0303-44dd-98fc-64812f250a07
@dzharikhin dzharikhin changed the title Consul cluster has no elected leader X-Consul-Knownleader: false in dev mode Aug 27, 2021
dzharikhin added a commit to hhru/consul-java-client that referenced this issue Aug 27, 2021
@kisunji kisunji self-assigned this Aug 30, 2021
@kisunji kisunji added the type/bug Feature does not function as expected label Aug 30, 2021
@kisunji
Copy link
Contributor

kisunji commented Sep 14, 2021

Hi @dzharikhin, could you help us by answering a few questions:

  • What is the linked integration test asserting that is causing it to fail?
  • Does the test fail consistently?
  • We recently released 1.10 - does that fail as well?

My initial guess is that the test may be a flake - occasionally it takes some time to establish a leader and internally we wrap leader checks in retries for our integration tests.

@kisunji kisunji added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Sep 14, 2021
@dzharikhin
Copy link
Author

dzharikhin commented Sep 21, 2021

Hi,
the test fails not by explicit assertion, but by an inner cache logic:
https://github.com/rickfast/consul-client/blob/master/src/itest/java/com/orbitz/consul/cache/ServiceHealthCacheITest.java#L43 wipes out test service entry, then the update should be propagated to the cache - so in the last assertion line we should get null
but there's inner check on cache update https://github.com/rickfast/consul-client/blob/master/src/main/java/com/orbitz/consul/cache/ConsulCache.java#L121 which fails, and that's why we get not updated hence not null result in the assertion
ConsulReponse is initially constructed here: https://github.com/rickfast/consul-client/blob/master/src/main/java/com/orbitz/consul/util/Http.java#L120, so it sets knownLeader flag to false according to the header value in the response, which I mentioned as an issue

yes, the test fails consistently

and yes it fails on 1.10 image as well

I'm not sure that true leader establishment is having place at all in single node development mode, but to check your guess I increased pause https://github.com/rickfast/consul-client/blob/master/src/itest/java/com/orbitz/consul/cache/ServiceHealthCacheITest.java#L44 up to a second and still having the same result, so maybe time interval is not the cause

@github-actions github-actions bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Sep 21, 2021
@kisunji
Copy link
Contributor

kisunji commented Sep 21, 2021

Thank you for the explanation!

I found this related issue #9776 which discusses the problem with orbitz library.
Consul 1.10 introduced streaming for blocking queries where it notes:

While streaming is a significant optimization over long polling, it will not populate the X-Consul-LastContact or X-Consul-KnownLeader response headers, because the required data is not available to the client.

This is not a bug but rather a changed behaviour, which is unfortunate. I'm not sure if the latest release of orbitz addressed this change.

I encourage you to continue the discussion in #9776 so our team can better track common issues.

@kisunji kisunji closed this as completed Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

2 participants