-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Recalculating flush jitter #1296
Comments
I would prefer to change the behavior of telegraf, sleeping for a random (newly assigned) amount before flushing each output. This is currently the behavior of because of the architecture there is no way for |
My only consideration is about users in the field currently using flush_jitter for its original purpose to avoid having multiple agents send data at the exact same time, while flushing happens at fixed intervals. Adding the new behavior for jitter as a new feature & config parameter would enable us to support both solutions. |
@kostasb currently flush_jitter does jitter the interval by a random amount, but for each instance it is a fixed length that gets set at startup. This would be just to change the behavior to reget the jitter each interval. |
@sparrc That is my consideration: users who currently use jitter to introduce some fixed delay that is consistent for each flush, will be getting a random delay for each flush once this behavior is changed. For example, in case there are data points emitted to Telegraf without a timestamp this will result in random timestamps inside the jitter window, while with the current fixed jitter the interval would be consistent for every write. This might affect some users, which is why I suggested keeping both behaviors. |
Telegraf doesn't emit metrics without a timestamp, and I don't think users would be relying on a "fixed" flushing interval that changes any time the service is restarted or reloaded. It seems to me like it makes more sense to make the two |
@sparrc +1. Anybody who wants this will really want true random (random on each flush). |
@sparrc As long as we add a note in documents for the change in jitter behavior and the use cases affected for data points arriving to Telegraf without timestamps ( e.g. udp and tcp input plugins ) that sounds good to me. |
@kostasb yes, will do :), Telegraf adds timestamps when it adds points, so any udp or tcp points would get timestamps as soon as they hit the telegraf listener. |
@sparrc Thanks for clarifying that, then the change shouldn't really affect any real world cases. |
The current implementation isnt really achieving a terribly random effect. Here is real world data from a random sample of 1,000 servers running telegraf plotting the interval out of the log file on startup: Looked at another way: The hope is that a random item per node per 10 seconds will be far better distributed to achieve the goal (which is a overall uniform, or close to uniform, distribution). Equally important, it will also eliminate long term biases for some nodes. Thanks guys! |
@sparrc Will it be possible for jitter to go down to ms precision or will we stick with seconds ? |
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
it will be nanoseconds |
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
use a common function between collection_jitter and flush_jitter. which creates the same behavior between the two options. going forward, both jitters will be random sleeps that get re-evaluated at runtime for every interval (previously only collection_jitter did this) also fixes behavior so that both jitters will exit in the event of a process exit. closes #1296
Thanks, @sparrc. We will get this into our environment ASAP and let you know! |
Scope:
Change the behavior of Telegraf's "flush_jitter" to make it sleep for a random,newly assigned, amount of time before flushing each output.
original topic:
Add a flush_randomize_jitter configuration parameter for the agent, that will force recalculation of the flush interval (FlushInterval.Duration) for the next buffer flush cycle.
This will enable the use of flush interval jitter in the same Telegraf agent, while the current flush_jitter implementation calculates the flush interval only once at daemon startup using the flush_jitter parameter, but then maintains the calculated flush interval for all onward cycles.
The text was updated successfully, but these errors were encountered: