Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input.dns_query does not log return code on error #5276

Closed
MarkusTeufelberger opened this issue Jan 10, 2019 · 10 comments
Closed

input.dns_query does not log return code on error #5276

MarkusTeufelberger opened this issue Jan 10, 2019 · 10 comments
Labels
feature request Requests for new plugin and for new features to existing plugins
Milestone

Comments

@MarkusTeufelberger
Copy link

Relevant telegraf.conf:

[[inputs.dns_query]]
domains = ["google.com"]
servers = ["1.1.1.1"]
record_type = "A"

System info:

telegraf 1.9.2

Linux amd64

Steps to reproduce:

  1. Have a more unreliable server than 1.1.1.1 in your servers list
  2. This server doesn't successfully return the record

Expected behavior:

I see the response code of the failing DNS query in the logs (ideally, it would even get exported as metric).

Actual behavior:

E! [inputs.dns_query]: Error in plugin: Invalid answer name google.com after A query for google.com

Additional info:

List of response codes in numeric format:
https://github.com/miekg/dns/blob/db3d0ce13bdc1a6b8c7f6bdf75448e9a4306951b/types.go#L119-L139

List of response codes as strings:
https://github.com/miekg/dns/blob/513c1ff2211924517b6015ebdbb995e8a10c028c/msg.go#L169-L191

Logging in telegraf happens here:

if r.Rcode != dns.RcodeSuccess {
return dnsQueryTime, errors.New(fmt.Sprintf("Invalid answer name %s after %s query for %s\n", domain, d.RecordType, domain))
}

@MarkusTeufelberger
Copy link
Author

It would also be great to see which server in the list of servers caused that error in the logs.

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Jan 10, 2019
@danielnelson
Copy link
Contributor

I agree this would be better as a metric using the result tag and result_code field, is this something you would have time to work on?

@MarkusTeufelberger
Copy link
Author

I would be quickly able to do the change to the log message.
Transforming it into a metric (including tests etc.) would take quite a bit more work from my side (also since I'm not that fluent in go).

If you want me to do this, I anyways first have to discuss the CLA stuff with my employer which is quite a lot of overhead for a one line change... I'd be happy if someone else (who already has signed the CLA) could do the log message instead.

@danielnelson
Copy link
Contributor

No problem, I'll put it on the next release milestone for us to complete but these targets often slip.

@danielnelson danielnelson added this to the 1.10.0 milestone Jan 10, 2019
@glinton
Copy link
Contributor

glinton commented Feb 12, 2019

From what I see it's always been creating a metric with the result and result_code, but I've enhanced the logging to give more information:

2019-02-12T15:41:12Z E! [inputs.dns_query]: Error in plugin: Invalid answer (NXDOMAIN) from 127.0.0.1 after A query for definitelynotreal.com
dns_query,domain=definitelynotreal.com,record_type=A,result=error,server=127.0.0.1 result_code=2i 1549986072000000000
dns_query,domain=google.com,record_type=A,result=success,server=127.0.0.1 result_code=0i,query_time_ms=0.440704 1549986072000000000

@danielnelson
Copy link
Contributor

The logging is good but I think to close this issue we need to add the rcode to the metrics. Since we can't include the dns rcode in the result_code, as it would conflict with the existing codes, I propose adding a tag rcode with the rcode name, along with a new integer field rcode_value with the decimal value of the rcode.

@glinton
Copy link
Contributor

glinton commented Feb 12, 2019

Ya that makes sense, probably what was meant above lol. I'll get that in too.

@danielnelson danielnelson added feature request Requests for new plugin and for new features to existing plugins and removed bug unexpected problem or unintended behavior labels Feb 13, 2019
@MarkusTeufelberger
Copy link
Author

Yeah, the logged data is already good enough to react on a generic telegraf error, but since in some cases these return codes might even be expected, the nicer solution would be to expose them as metric (per DNS server) instead of a general "DNS is healthy/not healthy" metric.

@danielnelson
Copy link
Contributor

We added this as a new tag+field pair in #5417, can you take a look?

@MarkusTeufelberger
Copy link
Author

Looks great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

No branches or pull requests

3 participants