Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SNMP plugin error logging #1814

Closed
kostasb opened this issue Sep 26, 2016 · 8 comments
Closed

Improve SNMP plugin error logging #1814

kostasb opened this issue Sep 26, 2016 · 8 comments
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin help wanted Request for community participation, code, contribution
Milestone

Comments

@kostasb
Copy link

kostasb commented Sep 26, 2016

It would help with troubleshooting if errors mentioned particular failing OID's in the SNMP plugin.

E.g. appending the OID that failed to the error log would help identify problematic ones: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/snmp/snmp.go#L448

@phemmer
Copy link
Contributor

phemmer commented Sep 26, 2016

Agree. Its a trivial change. I'll try to get to it tonight, and see if there's any other errors which can benefit from more context.

@Ebrink
Copy link

Ebrink commented Sep 27, 2016

It would also be helpfull if the errorlog would contain the specific SNMP instance and/or device that is failing. Every once in a while I get a "snmp collection took longer than collection interval", but it is hard to find out which instance or device produces the error with 53 SNMP instances polling 400+ devices.

@kostasb
Copy link
Author

kostasb commented Sep 27, 2016

@Ebrink The took longer than collection interval message is generated by the Telegraf agent itself, which is an external component to the snmp plugin: https://github.com/influxdata/telegraf/blob/master/agent/agent.go#L174 . It is thrown when the agent times out a plugin's execution because the interval is reached.

I will check how more verbose output can be provided there.

@sparrc sparrc added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Sep 27, 2016
@sparrc sparrc added this to the 1.1.0 milestone Sep 27, 2016
@kostasb
Copy link
Author

kostasb commented Oct 7, 2016

@phemmer For the OID, I think it would suffice to append the string to the message as "performing get on %s",oid. It would also be useful to specify the actual agent that encountered the polling error.

@sparrc sparrc modified the milestones: Future Milestone, 1.1.0 Oct 12, 2016
@sparrc sparrc added the help wanted Request for community participation, code, contribution label Oct 12, 2016
@kostasb
Copy link
Author

kostasb commented Dec 13, 2016

@phemmer @sparrc We are still logging some messages, particularly full-on timeouts without an agent / OID to identify what the failure was about:

2016/12/13 12:31:15 E! ERROR: input [inputs.snmp] took longer to collect than collection interval (5m0s)

@phemmer
Copy link
Contributor

phemmer commented Dec 13, 2016

I have a local branch where I've done most of the work on this, just need to finish it up (mostly just make sure I didn't miss anything). I'll see if I can get to it this weekend.

@phemmer
Copy link
Contributor

phemmer commented Jan 2, 2017

Sorry for the huge delay on this. For as trivial as a change as it was, I should have finished it up long ago :-(

Anyway, PR is up (#2220). However the PR does not address the most recent comment (the "took longer to collect than ..."). This error comes from telegraf core, and not the snmp plugin. Trying to address this could be tricky as inputs don't really have a standard "name" identifier. The only way I can think to address it is to add the line number of the config where the input begins.

@kostasb
Copy link
Author

kostasb commented Jan 3, 2017

Thanks @phemmer for the PR.

As for the took longer to collect than ... error messages, I believe @sparrc plans on adding a naming scheme for plugins: #1815

@sparrc sparrc modified the milestones: 1.2.0, Future Milestone Jan 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin help wanted Request for community participation, code, contribution
Projects
None yet
Development

No branches or pull requests

4 participants