tracing: memory budgeting #59188

tbg · 2021-01-20T08:51:17Z

Is your feature request related to a problem? Please describe.
Verbose trace spans collect possibly large amounts of data in memory, which can present a stability risk. With always-on tracing, the same holds for the structured payloads (though with a small factor, since a lot less data is collected).

Describe the solution you'd like
Heuristically count the number of bytes held in a Span's own recording (i.e. not traversing into the children). New data enters via these methods:

cockroach/pkg/util/tracing/span.go

Lines 324 to 366 in a01689b

    
           // LogStructured adds a Structured payload to the Span. It will be added to the 
        
           // recording even if the Span is not verbose; however it will be discarded if 
        
           // the underlying Span has been optimized out (i.e. is a noop span). 
        
           // 
        
           // The caller must not mutate the item once LogStructured has been called. 
        
           func (s *Span) LogStructured(item Structured) { 
        
           	if s.isNoop() { 
        
           		return 
        
           	} 
        
           	s.crdb.logStructured(item) 
        
           } 
        
           // Record provides a way to record free-form text into verbose spans. 
        
           // 
        
           // TODO(irfansharif): We don't currently have redactability with trace 
        
           // recordings (both here, and using LogStructured above). We'll want to do this 
        
           // soon. 
        
           func (s *Span) Record(msg string) { 
        
           	if !s.hasVerboseSink() { 
        
           		return 
        
           	} 
        
           	if s.ot.shadowSpan != nil { 
        
           		s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, msg)) 
        
           	} 
        
           	if s.netTr != nil { 
        
           		s.netTr.LazyPrintf("%s", msg) 
        
           	} 
        
           	s.crdb.record(msg) 
        
           } 
        
           // Recordf is like Record, but accepts a format specifier. 
        
           func (s *Span) Recordf(format string, args ...interface{}) { 
        
           	if !s.hasVerboseSink() { 
        
           		return 
        
           	} 
        
           	str := fmt.Sprintf(format, args...) 
        
           	if s.ot.shadowSpan != nil { 
        
           		s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, str)) 
        
           	} 
        
           	if s.netTr != nil { 
        
           		s.netTr.LazyPrintf(format, args) 
        
           	} 
        
           	s.crdb.record(str)

As a set max size (per span) is exceeded or the root account runs out of quota, drop old messages every time a new message is added. Also set a flag on the span that indicates that the recording is incomplete, perhaps by retaining the most recent timestamp for which events were dropped. We should drop verbose messages first, then structured payloads.

Note that the opentracing and net/trace spans use a cardinality based limit:

cockroach/pkg/util/tracing/tracer.go

Line 343 in 513b01b

netTr.SetMaxEvents(maxLogsPerSpan)

cockroach/pkg/util/tracing/shadow.go

Lines 107 to 112 in e9e1f2b

    
           return lightStepManager{}, lightstep.NewTracer(lightstep.Options{ 
        
           	AccessToken:      token, 
        
           	MaxLogsPerSpan:   maxLogsPerSpan, 
        
           	MaxBufferedSpans: 10000, 
        
           	UseGRPC:          true, 
        
           })

I think for net/trace we may want to remove the limit and instead, upon first exceeding the memory budget for any given span, print a message to net/trace explaining that no more logs will be relayed there.

Describe alternatives you've considered

We could also try to attach a memory budget and use all of the infrastructure we already have in SQL, by attaching

cockroach/pkg/util/mon/bytes_usage.go

Lines 30 to 35 in 94a429b

    
           // BoundAccount and BytesMonitor together form the mechanism by which 
        
           // allocations are tracked and constrained (e.g. memory allocations by the 
        
           // server on behalf of db clients). The primary motivation is to avoid common 
        
           // cases of memory or disk blow-ups due to user error or unoptimized queries; a 
        
           // secondary motivation in the longer term is to offer more detailed metrics to 
        
           // users to track and explain memory/disk usage.

to the Tracer

cockroach/pkg/util/tracing/tracer.go

Lines 114 to 132 in 513b01b

    
           // Tracer is our own custom implementation of opentracing.Tracer. It supports: 
        
           // 
        
           //  - forwarding events to x/net/trace instances 
        
           // 
        
           //  - recording traces. Recording is started automatically for spans that have 
        
           //    the verboseTracingBaggageKey baggage and can be started explicitly as well. Recorded 
        
           //    events can be retrieved at any time. 
        
           // 
        
           //  - lightstep traces. This is implemented by maintaining a "shadow" lightstep 
        
           //    Span inside each of our spans. 
        
           // 
        
           // Even when tracing is disabled, we still use this Tracer (with x/net/trace and 
        
           // lightstep disabled) because of its recording capability (verbose tracing needs 
        
           // to work in all cases). 
        
           // 
        
           // Tracer is currently stateless so we could have a single instance; however, 
        
           // this won't be the case if the cluster settings move away from using global 
        
           // state. 
        
           type Tracer struct {

and use that to guide the decisions. This might actually be my favorite, assuming that the mon package is in good standing (I'm not familiar with it). Some notes:

this will likely require refactoring of the mon package, since the tracing package can not depend on util/log due to import cycles, which is fine.
we need to clear the recording in Span.Finish() and we need to prevent new recording data from entering a Finish'ed span (for otherwise it will allocate from the monitor, but never release back).
this is more brittle in the sense that a failure to return bytes to the monitor correctly will cause problems. However, we can add assertions that are active through large swathes of our test suite which should give us good confidence that this is not happening.
we need to determine where to take the bytes from. We could hang it off the sql memory monitor, or use a separate, fixed-size monitor.

Additional context

We will want some kind of basic safeguard in place for 21.1, even if it just extends the maxLogsPerSpan mechanism to also apply to payloads.

Jira issue: CRDB-3323

The text was updated successfully, but these errors were encountered:

tbg · 2021-03-02T11:59:50Z

@irfansharif I just saw that we merged #60678 without fixing the large allocation for the payload ring buffer. Is your follow-up coming out soon? We're in the stability period and performance work is happening (see #61328). I don't think that particular allocation is the root cause (if there even is a single root cause, sigh) but it's certainly in the hot path and shouldn't stay there for long. This is next on your list, right?

irfansharif · 2021-03-02T14:14:09Z

yup, just got a bit side-tracked with other stability issues on my plate.

jordanlewis · 2021-03-02T15:52:39Z

I'm a little scared of attaching this work to SQL monitors. In particular, lifespan bugs in monitors are often hard to track down and debug. I think it could mean that tracing lifespan bugs would have a larger blast radius than they would otherwise.

I think the monitor package is probably suitable, it's just we'd need to be quite careful about how we use it.

Touches cockroachdb#59188. Follow-on work from cockroachdb#60678. Release justification: low risk, high benefit changes to existing functionality Release note: None

Touches cockroachdb#59188. We can introduce byte-limits for verbose logging messages in a similar manner to what we've done for structured events. This commit also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None

tbg · 2021-03-04T12:06:02Z

As of #61359, each Span individually limits the bytes in payloads and the bytes in verbose messages (both use a FIFO ring buffer).

We should apply something similar to these two slices:

cockroach/pkg/util/tracing/crdbspan.go

Lines 68 to 71 in 6c3b89d

    
           children []*crdbSpan 
        
           // remoteSpan contains the list of remote child span recordings that 
        
           // were manually imported. 
        
           remoteSpans []tracingpb.RecordedSpan

As we can easily see unbounded growth in them, see for example #59560.

This caps the size of the span registry at 5k, evicting "random" (map order) existing entries when at the limit. The purpose of this is to serve as a guardrail against leaked spans, which would otherwise lead to unbounded memory growth. Touches cockroachdb#59188. Release justification: low risk, high benefit changes to existing functionality Release note: None

59288: rfc: declarative, state-based schema changer r=lucy-zhang a=lucy-zhang https://github.com/lucy-zhang/cockroach/blob/schema-changer-rfc/docs/RFCS/20210115_new_schema_changer.md Release note: none 61459: tracing: cap registry size at 5k r=irfansharif a=tbg This caps the size of the span registry at 5k, evicting "random" (map order) existing entries when at the limit. The purpose of this is to serve as a guardrail against leaked spans, which would otherwise lead to unbounded memory growth. Touches #59188. Release justification: low risk, high benefit changes to existing functionality Release note: None 61471: server: disable flaky check in TestDecommissionedNodeCannotConnect r=tbg a=erikgrinaker In #61356, `TestDecommissionedNodeCannotConnect` was extended to check error propagation for a `Scan` operation after a node was decommissioned. However, because of #61470 not all failure modes propagate errors properly, causing the test to sometimes hang. This patch disables this check until these problems are addressed. Resolves #61452. Release justification: non-production code changes Release note: None Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

Touches cockroachdb#59188. Follow-on work from cockroachdb#60678. Release justification: low risk, high benefit changes to existing functionality Release note: None

Touches cockroachdb#59188. We can introduce byte-limits for verbose logging messages in a similar manner to what we've done for structured events. This commit also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None

@knz

61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif Touches #59188. Follow-on work from #60678. We can introduce byte-limits for verbose logging and structured events, instead of limiting things based on count. This PR also: - adds a _dropped tag to recordings with dropped logs/structured events. - squashes a bug where reset spans (as used in SessionTracing) still held onto earlier structured events - moves away from the internal usage of the opentracing.LogRecord type, it's unnecessary Release justification: low risk, high benefit changes to existing functionality Release note: None --- +cc @knz / @erikgrinaker / @angelapwen for pod-visibility. 61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi Fixes: #59711 Previously, there were only over all counters tracking how many jobs were completed, cancelled, or failed. This was inadequate because it didn't make it easy to tell in aggregate what job types they were. To address this, this patch will add counters for different job types for tracking success, failure, and cancellation. Release justification: Low risk change only adding a metric inside the crdb_internal.feature_usage table Release note: None 61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde This changes the random table generator to also create computed columns (either STORED or VIRTUAL). Some example of definitions: - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL` - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED` - `col1_13 INT4 AS (col1_0 + col1_10) STORED` Release justification: non-production code change. Release note: None Informs #57608. 61509: sql: add a regression test r=RaduBerinde a=RaduBerinde This commit adds a regression test for #58104 (the problem was already fixed). Resolves #58104. Release justification: non-production code change. Release note: None 61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner Previously, the fetch scope incorrectly included columns in the FROM clause of an UPDATE..FROM statement. As a result, column names shared by the FROM clause and the mutating table lead to ambiguity when resolving partial index DEL column expressions. This commit ensures that the fetch scope does not include columns in the FROM clause. Fixes #61284 Release justification: This is a low-risk bug fix to existing functionality. Release note (bug fix): An UPDATE..FROM statement where the FROM clause contained column names that match table column names erred if the table had a partial index predicate referencing those columns. This bug, present since partial indexes were released in version 20.2.0, has been fixed. 61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna These errors.Wrap calls are wrapping errors that are nil and thus will always return a nil error. Release justification: Minor error handling fixes Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com>

In cockroachdb#59560 we've seen that this can grow unboundedly. Set a limit out of an abundance of caution. Fixes cockroachdb#59188. Release note: None

irfansharif · 2021-07-12T17:37:27Z

@abarganier the last PR I had for memory budgeting was #61714, that I'm not finding the time to get to. Mind if I throw the ball over to you? I don't think this is very pressing, that PR was mostly out of an abundance of caution. But still, would be nice to have as a safeguard.

abarganier · 2021-07-12T18:12:07Z

@irfansharif SGTM, I think this would fit nicely into the tracing library handoff. Just to clarify, you closed the PR because you don't have the time, not because the solution is invalid, correct? I'd be happy to work on driving it home.

irfansharif · 2021-07-12T18:15:05Z

Yup, just don't have the bandwidth for the small bit of polish+shepherding needed. That PR is pretty tiny though, and I'm happy to rubber stamp another attempt at it.

abarganier · 2021-07-12T18:21:35Z

Cool sounds good, I'll get this scheduled on our end and tag you as a reviewer. I've updated the owning team accordingly. It might have to wait until a bit later in the month given other work on our plates, but I will keep you updated.

github-actions · 2023-09-05T11:09:44Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

tbg added A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Jan 20, 2021

irfansharif self-assigned this Feb 24, 2021

irfansharif mentioned this issue Mar 2, 2021

tracing: use byte-limits for logs/structured events per span #61359

Merged

tbg mentioned this issue Mar 4, 2021

tracing: cap registry size at 5k #61459

Merged

irfansharif mentioned this issue Mar 9, 2021

tracing: limit number of imported remote spans #61714

Closed

adityamaru mentioned this issue May 11, 2021

jobs: Jobs Observability tracking issue #64992

Closed

jlinder added the T-kv KV Team label Jun 16, 2021

exalate-issue-sync bot assigned abarganier and unassigned irfansharif Jul 12, 2021

exalate-issue-sync bot added T-observability-inf and removed T-kv KV Team labels Jul 12, 2021

github-actions bot added the no-issue-activity label Sep 5, 2023

github-actions bot added the X-stale label Sep 19, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023

exalate-issue-sync bot closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing: memory budgeting #59188

tracing: memory budgeting #59188

tbg commented Jan 20, 2021 •

edited by cockroach-jira-scripts

Loading

tbg commented Mar 2, 2021

irfansharif commented Mar 2, 2021

jordanlewis commented Mar 2, 2021

tbg commented Mar 4, 2021

irfansharif commented Jul 12, 2021

abarganier commented Jul 12, 2021

irfansharif commented Jul 12, 2021

abarganier commented Jul 12, 2021

github-actions bot commented Sep 5, 2023

tracing: memory budgeting #59188

tracing: memory budgeting #59188

Comments

tbg commented Jan 20, 2021 • edited by cockroach-jira-scripts Loading

tbg commented Mar 2, 2021

irfansharif commented Mar 2, 2021

jordanlewis commented Mar 2, 2021

tbg commented Mar 4, 2021

irfansharif commented Jul 12, 2021

abarganier commented Jul 12, 2021

irfansharif commented Jul 12, 2021

abarganier commented Jul 12, 2021

github-actions bot commented Sep 5, 2023

tbg commented Jan 20, 2021 •

edited by cockroach-jira-scripts

Loading