Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Streaming] Set KnownLeader by default when streaming is activated #9778

Conversation

pierresouchay
Copy link
Contributor

This is not a real fix but will avoid breaking clients relying on this header

Fix #9776

I know the fix is not semantically correct (as we should carry the knowleader with streaming), but don't see how to do it as the data is not carried in anyway with GRPC on updates. Would probably require a refactoring.

@vercel vercel bot temporarily deployed to Preview – consul February 17, 2021 18:12 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 17, 2021 18:12 Inactive
@hashicorp-ci
Copy link
Contributor

🤔 Double check that this PR does not require a changelog entry in the .changelog directory. Reference

This is not a real fix but will avoid breaking clients relying on this header

Fix hashicorp#9776
@vercel vercel bot temporarily deployed to Preview – consul February 17, 2021 20:35 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 17, 2021 20:35 Inactive
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I was not familiar with this header, so I went looking in our docs. I found it referenced at the end of Consistency Modes, but all it says is

can be used by clients to gauge the staleness of a result and take appropriate action

As you say, reporting true in the case of streaming seems like it is a lie. I guess that it's also been a lie for any results from the cache. If the response came from a cache, the client doesn't actually know if there is an active leader.

I spoke with @kyhavlov and we think it might be better to omit this header instead of continuing to lie. From what I can, considering X-Consul-KnownLeader=false to be an error is incorrect. Maybe we should report this as a bug in Orbitz?

@pierresouchay
Copy link
Contributor Author

To me, interpretation of knowLeader=false being the data I am returning is not accurate and can't be trusted looks correct from an SDK implementors POV. To me it sounds like a reasonable choice.

I though about removing the header, but it required more lines, and it sounds also more correct to me, but also more disruptive as we don't know if other implementors are not counting on that header even if missing from header (I mean being absent implies false)

Maybe @yfouquet (an Orbitz contributor) has an opinion about this?

@banks
Copy link
Member

banks commented Mar 22, 2021

We'd still love to hear from users who'd be impacted by this change (especially Orbitz client contributors).

Currently though, the KnownLeader header doesn't really make a huge amount of sense for streaming. In order to make it behave in a way that is meaningful we'd need to add significant complexity to how streaming works (with regular updates about the server's state of whether it knows of a leader or not being sent as well as just actual result updates). And for now we're not sure there is much value in the feature - it's hard to see how a client will actually make use of that information since it can't force the agent to try a different server while using streaming.

As Daniel mentioned before it's not really correct or advisable to assume KnownLeader = false is an error state. So the only thing a client can do is decide to retry or not, but it the context of streaming, it can't even decide to use a different server to see if it gets a less stale result.

Even if we made some way for that to happen, from experience we've seen that while it sounds desirable, setting a limit on how stale things can be almost always causes more problems than it solves. It could in theory limit staleness of a read in some cases where a single server is partitioned from the rest of the servers but the clients can still speak to all of them - if the client retries with a way to steer directly to the leader, then they can "recover" from the stale server.

The problem is that that condition is extremely rare - I've never heard of it happening on a real Consul cluster. What does happen very often is that servers get "stale" or have "no leader" due to being overloaded and unstable. In this case, having all the clients after some timeout suddenly switch to making more expensive requests just increases the load on all servers and the leader but doesn't actually get a less stale result. In other words, it just makes the overload issues even worse.

So our inclination is to remove these headers from streaming (and probably cached) responses and note that in the upgrade guide.

We'd encourage any client code that uses it in logic to avoid assuming a missing header (or false value) means that there is no cluster leader - this is already incorrect behaviour as Daniel pointed out above.

@dnephin dnephin added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Mar 25, 2021
@dnephin
Copy link
Contributor

dnephin commented May 5, 2021

Thank you again for making this change.

Unfortunately the logic changed in this PR has moved to agent/rpcclient/health so this commit can't be merged anymore.

Given the discussion above, and that we are already reporting KnownLeader: false in a bunch of places, and that it does not mean "no leader", I'm going to close this PR. Let's continue the discussion in #9776.

@dnephin dnephin closed this May 5, 2021
@ghost ghost removed waiting-reply Waiting on response from Original Poster or another individual in the thread labels May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Streaming] Results from nodes with streaming set header X-Consul-Knownleader: false
4 participants