-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added an option to skip failed outputs in agent startup #10111
Conversation
Thanks so much for the pull request! |
!signed-cla |
ff6dc62
to
c68533e
Compare
👍 This pull request doesn't change the Telegraf binary size 📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
hey could you please approve other workflows so this feature could be merged? :) |
@srebhan who is allowed to review my feature? anyone? or are there certain specifications for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea is good, but I would add a command line option for that as well.
It seems to me that it might be better if outputs that are not able to connect at startup are retried after some time instead of just being completely ignored for the duration of telegraf run.
@@ -199,6 +199,9 @@ type AgentConfig struct { | |||
|
|||
Hostname string | |||
OmitHostname bool | |||
|
|||
// Allows running the agents even if some outputs fail | |||
SkipFailedOutputs bool `toml:"skip_failed_outputs"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update related agentConfig
with example configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well as the etc/telegraf.conf
and etc/telegraf_windows.conf
This PR also doesn't solve #3723 since it is requested to have x times or retry. This PR doesn't add this behaviour. Comments in a related PR (#9051) mentioned that plugins should be implementing the |
@eran-gil2 I'd like to discuss this with the core-devs as this is a very mighty feature IMO. To me it seems too easy to loose data when using this feature without caution as you will get an "everything alright" impression but you easily have no working output. I think this might be some fundamental thing to retry to connect to the outputs until we reach a buffer overflow or anything. We even might want to persist the metrics locally in this case... |
@eran-gil2 after discussing this matter with the team, we came to the conclusion that a global option to ignore all errors is not the way we want to go. There have been similar proposals (#11313) and requests (#10694, #11289) for the input side. However, we think it's better to fix the individual plugins to avoid connecting to services in For now, I'm closing this PR as a correct implementation requires a rework of Telegraf internals which is out-of-scope for now. |
Required for all PRs:
resolves #3723, resolves #9778