Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common blocking implementation details to docs #5358

Merged
merged 7 commits into from
Feb 21, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions website/source/api/index.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,47 @@ to the supplied maximum `wait` time to spread out the wake up time of any
concurrent requests. This adds up to `wait / 16` additional time to the maximum
duration.

### Implementation Details

While the mechanim is relatively simple to work with, there are a few subtelties
kaitlincart marked this conversation as resolved.
Show resolved Hide resolved
that a robust client must observe in order to not behave badly in edge cases.
banks marked this conversation as resolved.
Show resolved Hide resolved
* **Reset the index if it goes backwards**. While indexes in general are
kaitlincart marked this conversation as resolved.
Show resolved Hide resolved
monotonically increasing, there are several real-world scenarios in
kaitlincart marked this conversation as resolved.
Show resolved Hide resolved
which they can go backwards for a given query. Implementations must check
to see if a returned index is lower than the previous value,
and if it is, should reset index to `0` - effectively restarting their blocking loop.
Failure to do so may cause the client to miss future updates for an unbounded
time, or to use an invalid index value that causes no blocking and increases
load on the servers. Cases where this can occur include:
* If a raft snapshot is restored on the servers with older version of the data
banks marked this conversation as resolved.
Show resolved Hide resolved
* KV list operations where an item with the highest index is removed
banks marked this conversation as resolved.
Show resolved Hide resolved
* A consul upgrade changes the way watches work to optimise them with more
banks marked this conversation as resolved.
Show resolved Hide resolved
kaitlincart marked this conversation as resolved.
Show resolved Hide resolved
granular indexes.
* **Sanity check index is greater than zero**. After the initial request (or a
reset as above) the `X-Consul-Index` returned _should_ always be greater than zero. It
is a bug in Consul if it is not, however this has happened a few times and can
still be triggered on some older Consul versions. It's especially bad because it
causes blocking clients that are not aware to enter a busy loop, using excessive
client CPU and causing high load on servers. It is _always_ safe to use an
index of `1` to wait for updates when the data being requested doesn't exist
yet, so clients _should_ sanity check that their index is at least 1 after
each blocking response is handled to be sure they actually block on the next
request.
* **Rate limit**. The blocking query mechanism is reasonably efficient when updates
are relatively rare (order of tens of seconds to minutes between updates). In cases
where a result gets updated very fast however - possibly during an outage or incident
with a badly behaved client - blocking query loops degrade into busy loops that
consume execessive client CPU and causing high server load. While it's possible to just add a sleep
banks marked this conversation as resolved.
Show resolved Hide resolved
to every iteration of the loop, this is **not** recommended since it causes update
delivery to be delayed in the happy case, and it can exacerbate the problem since
it increases the chance that the index has changed on the next request. Clients
_should_ instead rate limit the loop so that in the happy case they proceed without
waiting, but when values start to churn quickly they degrade into polling at a
reasonable rate (say every 15 seconds). Ideally this is done with an algorithm that
allows a couple of quick successive deliveries before it starts to limit rate - a
[token bucket](https://en.wikipedia.org/wiki/Token_bucket) with burst of 2 is a simple
way to acheive this.
banks marked this conversation as resolved.
Show resolved Hide resolved

### Hash-based Blocking Queries

A limited number of agent endpoints also support blocking however because the
Expand Down