-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a reverse dns processor #7927
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,7 +99,31 @@ func (w goMetricsFuncGaugeFloat) Visit(_ monitoring.Mode, vs monitoring.Visitor) | |
func (w goMetricsHistogram) wrapped() interface{} { return w.h } | ||
func (w goMetricsHistogram) Get() int64 { return w.h.Sum() } | ||
func (w goMetricsHistogram) Visit(_ monitoring.Mode, vs monitoring.Visitor) { | ||
vs.OnInt(w.Get()) | ||
vs.OnRegistryStart() | ||
defer vs.OnRegistryFinished() | ||
|
||
h := w.h.Snapshot() | ||
ps := h.Percentiles([]float64{0.5, 0.75, 0.95, 0.99, 0.999}) | ||
vs.OnKey("count") | ||
vs.OnInt(w.h.Count()) | ||
vs.OnKey("min") | ||
vs.OnInt(w.h.Min()) | ||
vs.OnKey("max") | ||
vs.OnInt(w.h.Max()) | ||
vs.OnKey("mean") | ||
vs.OnFloat(w.h.Mean()) | ||
vs.OnKey("stddev") | ||
vs.OnFloat(w.h.StdDev()) | ||
vs.OnKey("median") | ||
vs.OnFloat(ps[0]) | ||
vs.OnKey("p75") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we returning these constant values? I assume this was intended to be the derived values at those percentiles from the actual histogram data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The percentiles are derived from samples stored in the histogram. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I misread that code. Thanks for the clarification. |
||
vs.OnFloat(ps[1]) | ||
vs.OnKey("p95") | ||
vs.OnFloat(ps[2]) | ||
vs.OnKey("p99") | ||
vs.OnFloat(ps[3]) | ||
vs.OnKey("p999") | ||
vs.OnFloat(ps[4]) | ||
} | ||
|
||
func (w goMetricsMeter) wrapped() interface{} { return w.m } | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
// Licensed to Elasticsearch B.V. under one or more contributor | ||
// license agreements. See the NOTICE file distributed with | ||
// this work for additional information regarding copyright | ||
// ownership. Elasticsearch B.V. licenses this file to you under | ||
// the Apache License, Version 2.0 (the "License"); you may | ||
// not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
package dns | ||
|
||
import ( | ||
"sync" | ||
"time" | ||
|
||
"github.com/elastic/beats/libbeat/monitoring" | ||
) | ||
|
||
type ptrRecord struct { | ||
host string | ||
expires time.Time | ||
} | ||
|
||
func (r ptrRecord) IsExpired(now time.Time) bool { | ||
return now.After(r.expires) | ||
} | ||
|
||
type ptrCache struct { | ||
sync.RWMutex | ||
data map[string]ptrRecord | ||
maxSize int | ||
} | ||
|
||
func (c *ptrCache) set(now time.Time, key string, ptr *PTR) { | ||
c.Lock() | ||
defer c.Unlock() | ||
|
||
if len(c.data) >= c.maxSize { | ||
c.evict() | ||
} | ||
|
||
c.data[key] = ptrRecord{ | ||
host: ptr.Host, | ||
expires: now.Add(time.Duration(ptr.TTL) * time.Second), | ||
} | ||
} | ||
|
||
// evict removes a single random key from the cache. | ||
func (c *ptrCache) evict() { | ||
var key string | ||
for k := range c.data { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could use a comment, maybe it's because I'm a go noob, I'd appreciate a comment on this function:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, not something that belongs in an initial impl, but it might be nice to add a real LRU at some point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that a better cache implementation would be nice. I tested a few different cache libs and none of them seemed all that great or efficient. So opted to keep is simple (this was also the approach I found in the coredns cache). In addition to improving on the algorithm, optimizing for the Go garbage collector would be nice too. When you get above 100k objects in a map the GC times start to grow. I looked at the bigcache which addresses this issue, and it worked very well, but it doesn't implement LRU. But let's see how this gets used before optimizing. I plan to mark this as beta in the docs for the first release. |
||
key = k | ||
break | ||
} | ||
delete(c.data, key) | ||
} | ||
|
||
func (c *ptrCache) get(now time.Time, key string) *PTR { | ||
c.RLock() | ||
defer c.RUnlock() | ||
|
||
r, found := c.data[key] | ||
if found && !r.IsExpired(now) { | ||
return &PTR{r.host, uint32(r.expires.Sub(now) / time.Second)} | ||
} | ||
return nil | ||
} | ||
|
||
type failureRecord struct { | ||
error | ||
expires time.Time | ||
} | ||
|
||
func (r failureRecord) IsExpired(now time.Time) bool { | ||
return now.After(r.expires) | ||
} | ||
|
||
type failureCache struct { | ||
sync.RWMutex | ||
data map[string]failureRecord | ||
maxSize int | ||
failureTTL time.Duration | ||
} | ||
|
||
func (c *failureCache) set(now time.Time, key string, err error) { | ||
c.Lock() | ||
defer c.Unlock() | ||
if len(c.data) >= c.maxSize { | ||
c.evict() | ||
} | ||
|
||
c.data[key] = failureRecord{ | ||
error: err, | ||
expires: now.Add(c.failureTTL), | ||
} | ||
} | ||
|
||
// evict removes a single random key from the cache. | ||
func (c *failureCache) evict() { | ||
var key string | ||
for k := range c.data { | ||
key = k | ||
break | ||
} | ||
delete(c.data, key) | ||
} | ||
|
||
func (c *failureCache) get(now time.Time, key string) error { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My read here is that the tradeoff is between:
where the first option is more memory efficient at the cost of duplicated code, while the second is a little less memory efficient but more DRY approach (though there is some extra complexity in methods since they have to determine the type they're working with). I'm neutral on the choice. It's a tradeoff, and the difference in terms of both isn't very large since 1. the code is simple. 2. finding the hash code of a short string is fast, but I thought I'd just bring this up because I'm curious if there's general guidance on this issue. In previous PL's I've used professionally the answer is generics. Not having them forces this choice, so I'd like to hear from more experience go programmers how this is handled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I chose what I thought was a more memory efficient implementation (1) at the cost of some duplication. This was mainly because there could be many of these items in memory, and secondly because the code was simple. I don't have any general advice 🤷♂️ other than to weigh the pros and cons for the choice like you have done here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, thanks for the feedback. |
||
c.RLock() | ||
defer c.RUnlock() | ||
|
||
r, found := c.data[key] | ||
if found && !r.IsExpired(now) { | ||
return r.error | ||
} | ||
return nil | ||
} | ||
|
||
type cachedError struct { | ||
err error | ||
} | ||
|
||
func (ce *cachedError) Error() string { return ce.err.Error() + " (from failure cache)" } | ||
func (ce *cachedError) Cause() error { return ce.err } | ||
|
||
// PTRLookupCache is a cache for storing and retrieving the results of | ||
// reverse DNS queries. It caches the results of queries regardless of their | ||
// outcome (success or failure). | ||
type PTRLookupCache struct { | ||
success *ptrCache | ||
failure *failureCache | ||
failureTTL time.Duration | ||
resolver PTRResolver | ||
stats cacheStats | ||
} | ||
|
||
type cacheStats struct { | ||
Hit *monitoring.Int | ||
Miss *monitoring.Int | ||
} | ||
|
||
// NewPTRLookupCache returns a new cache. | ||
func NewPTRLookupCache(reg *monitoring.Registry, conf CacheConfig, resolver PTRResolver) (*PTRLookupCache, error) { | ||
if err := conf.Validate(); err != nil { | ||
return nil, err | ||
} | ||
|
||
c := &PTRLookupCache{ | ||
success: &ptrCache{ | ||
data: make(map[string]ptrRecord, conf.SuccessCache.InitialCapacity), | ||
maxSize: conf.SuccessCache.MaxCapacity, | ||
}, | ||
failure: &failureCache{ | ||
data: make(map[string]failureRecord, conf.FailureCache.InitialCapacity), | ||
maxSize: conf.FailureCache.MaxCapacity, | ||
failureTTL: conf.FailureCache.TTL, | ||
}, | ||
resolver: resolver, | ||
stats: cacheStats{ | ||
Hit: monitoring.NewInt(reg, "hits"), | ||
Miss: monitoring.NewInt(reg, "misses"), | ||
}, | ||
} | ||
|
||
return c, nil | ||
} | ||
|
||
// LookupPTR performs a reverse lookup on the given IP address. A cached result | ||
// will be returned if it is contained in the cache, otherwise a lookup is | ||
// performed. | ||
func (c PTRLookupCache) LookupPTR(ip string) (*PTR, error) { | ||
now := time.Now() | ||
|
||
ptr := c.success.get(now, ip) | ||
if ptr != nil { | ||
c.stats.Hit.Inc() | ||
return ptr, nil | ||
} | ||
|
||
err := c.failure.get(now, ip) | ||
if err != nil { | ||
c.stats.Hit.Inc() | ||
return nil, err | ||
} | ||
c.stats.Miss.Inc() | ||
|
||
ptr, err = c.resolver.LookupPTR(ip) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to have a stat for actual DNS queries sent to help users determine if their cache size is appropriate. They could check if increasing the cache size diminishes the number of queries sent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't the number of cache misses be the metric to look when doing this testing? You'd see if the miss number decreased, right? Additionally there are metrics reported by the resolver that could indicate the total number of lookups if you summed success and failures across each DNS server. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, cache misses is an even better statistic. I think I missed that we report those stats already, but it does make sense that cache misses would be a good stat. |
||
if err != nil { | ||
c.failure.set(now, ip, &cachedError{err}) | ||
return nil, err | ||
} | ||
|
||
c.success.set(now, ip, ptr) | ||
return ptr, nil | ||
} | ||
|
||
func max(a, b int) int { | ||
if a >= b { | ||
return a | ||
} | ||
return b | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on forward lookups? Wouldn't have to be in this PR, but those are really useful for log lines where a hostname is specified, but an IP is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this being useful for certain use cases where the forward lookup (A or AAAA) has a deterministic response. I could see this being used to resolve hostnames from syslog events against your internal DNS. I don't plan to add this now but the config takes this into account with the lookup
type
field.Someone might also want to lookup TXT records, where the query is derived from a field value. For example a lookup like
%{[source][ip]}.origin.asn.cymru.com
(for ip-to-asn lookup), but this is probably best suited to a centralized enrichment process.