-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: introduce a per-span limit for structured events #60678
tracing: introduce a per-span limit for structured events #60678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we construct a way to also detect when the ring buffer has been full and cycled? We'll want to know when things have been lost.
Can we also use a ring buffer for the unstructured (log) events too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes to both of those, we'll want to tightly bound how large a verbose span can get and a ring buffer could be one way to do it. I'm hoping to take a stab at the code that renders trace events, but not yet. This could be captured then. For now, we don't have a great way to view structured events anyway (part of what Angela is working on), so I'll wait until we make head way there.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Touches #59188
I think we should merge this because it's 1000x better than master
now, but ideally the limiting would be based on a count of bytes, not just a count of items. We are already asking Structured
to implement interface{ Size() int }
so this is easy. All you'd have to change here to get this is to do
type crdbSpan {
bytesStructured int64 // atomic
}
func (s *crdbSpan) RecordStructured(item Structured) {
n := int64(item.Size())
if atomic.AddInt64(&s.bytesStructured, n) > maxStructuredBytesPerSpan {
// NB: this would also be a convenient place to mark the span has having
// filled up once we want that.
atomic.AddInt64(&s.bytesStructured, -n)
return
}
// do the usual stuff
}
and given how easy it is, I would do it - the count-based mechanism will also translate poorly to verbose messages, which can get huge.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen and @irfansharif)
pkg/util/tracing/tracer.go, line 402 at r1 (raw file):
}, } helper.crdbSpan.mu.structured.Reserve(maxStructuredEventsPerSpan)
This will allocate a slice of 50 elements for every Span, which seems inefficient. Why don't we let the ringbuf grow as needed, up to maxStructuredEventsPerSpan
?
Release note: None
b1b39cc
to
fd25615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have another PR brewing doing both those things, probably could do with another round of review, so merging this one as is.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen and @irfansharif)
Build failed (retrying...): |
Build failed: |
bors r+ |
Build succeeded: |
Touches cockroachdb#59188. Follow-on work from cockroachdb#60678. Release justification: low risk, high benefit changes to existing functionality Release note: None
Touches cockroachdb#59188. Follow-on work from cockroachdb#60678. Release justification: low risk, high benefit changes to existing functionality Release note: None
61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif Touches #59188. Follow-on work from #60678. We can introduce byte-limits for verbose logging and structured events, instead of limiting things based on count. This PR also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None --- +cc @knz / @erikgrinaker / @angelapwen for pod-visibility. 61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi Fixes: #59711 Previously, there were only over all counters tracking how many jobs were completed, cancelled, or failed. This was inadequate because it didn't make it easy to tell in aggregate what job types they were. To address this, this patch will add counters for different job types for tracking success, failure, and cancellation. Release justification: Low risk change only adding a metric inside the crdb_internal.feature_usage table Release note: None 61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde This changes the random table generator to also create computed columns (either STORED or VIRTUAL). Some example of definitions: - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL` - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED` - `col1_13 INT4 AS (col1_0 + col1_10) STORED` Release justification: non-production code change. Release note: None Informs #57608. 61509: sql: add a regression test r=RaduBerinde a=RaduBerinde This commit adds a regression test for #58104 (the problem was already fixed). Resolves #58104. Release justification: non-production code change. Release note: None 61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner Previously, the fetch scope incorrectly included columns in the FROM clause of an UPDATE..FROM statement. As a result, column names shared by the FROM clause and the mutating table lead to ambiguity when resolving partial index DEL column expressions. This commit ensures that the fetch scope does not include columns in the FROM clause. Fixes #61284 Release justification: This is a low-risk bug fix to existing functionality. Release note (bug fix): An UPDATE..FROM statement where the FROM clause contained column names that match table column names erred if the table had a partial index predicate referencing those columns. This bug, present since partial indexes were released in version 20.2.0, has been fixed. 61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna These errors.Wrap calls are wrapping errors that are nil and thus will always return a nil error. Release justification: Minor error handling fixes Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Release note: None