-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: cap registry size at 5k #61459
Conversation
This caps the size of the span registry at 5k, evicting "random" (map order) existing entries when at the limit. The purpose of this is to serve as a guardrail against leaked spans, which would otherwise lead to unbounded memory growth. Touches cockroachdb#59188. Release justification: low risk, high benefit changes to existing functionality Release note: None
With this in, conceivably we can do #59315 early (before the SQL team has fixed the leak). The bypass in that issue unfortunately hides pretty much all of the SQL gateway's traces from the registry, which is very unhelpful. This should be fine if the remark in the linked issue about these leaks occurring only at server shutdown. |
// is a leak. When the registry reaches max size, each new span added | ||
// kicks out some old span. We rely on map iteration order here to | ||
// make this cheap. | ||
if toDelete := len(t.activeSpans.m) - maxSpanRegistrySize + 1; toDelete > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you reckon dropping just the one (oldest) span when we've capped out could cause contention when we've gone overboard? I assume we'll want to do more profiling again, but that seems possible to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? Deleting the "first" element from the map (especially since it's bounded-size) should be approximately free. Hmm... The iteration maybe isn't. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems cheap:
func BenchmarkMapDeleteAdd1(b *testing.B) {
const size = 5000
m := make(map[int]int, size)
for i := 0; i < size; i++ {
m[rand.Intn(10000*size)] = 123
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
for idx := range m {
//delete(m, idx)
break
_ = m[idx]
}
//m[i] = 456
}
}
With the delete and assignment in, ~200ns/op. As quoted above, ~45ns/op. So the deletion+insertion dominate over doing the one-element iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment since I had to rebase on top of this code: the other option is to use a cache.UnorderedCache
which is an LRU cache. Iterations don't count as accesses and neither do gets if you use StealthyGet
, so only Add
s will, which means the cache will evict the spans that are most likely to be leaks.
bors r=irfansharif TFTR! |
Build succeeded: |
This caps the size of the span registry at 5k, evicting "random" (map
order) existing entries when at the limit.
The purpose of this is to serve as a guardrail against leaked spans,
which would otherwise lead to unbounded memory growth.
Touches #59188.
Release justification: low risk, high benefit changes to existing
functionality
Release note: None