-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: possible regression in the amount of things dropped from the trace #87536
Comments
Before this patch, when the recording of a child span was being added to the parent, if the number of spans in the child recording + the number of spans in the parent's recording were greater than the span limit (1000), then the child's recording was completely dropped (apart from the structured events, which were still retained). So, for example, if the parent had a recording of 1 span, and the child has a recording of 1000 spans, the whole 1000 spans were dropped. This patch improves things by always combining the parent trace and the child trace, and then trimming the result according to the following arbitrary algorithm: - start at the root of the trace and sort its children by size, desc - drop the fattest children (including their descendents) until the remaining number of spans to drop becomes smaller than the size of the fattest non-dropped child - recurse into that child, with an adjusted number of spans to drop So, the idea is that, recursively, we drop parts of the largest child - including dropping the whole child if needed. Fixes cockroachdb#87536 Release note: None
We think what's going on here is a combination of the shape of the trace changing in 22.1 with the introduction of the Streamer, which acts as a parent for a large number of spans, and a change that made trace recordings imported through DistSQL be subject to the 1000 span limit. |
I'm removing |
Another data point where we drop more stuff: consider the bundles attached in #88891. On 22.1.7 the distsql diagram has execution stats for all processors |
Hmmm I think those execution stats come from structured events, which are not supposed to be dropped by the span limits. This is a query we can reproduce, since it operates on |
Ran into a possibly related problem (probably not a regression though). When running |
I have statement bundles for the same statement that ran on 22.1.6 and 22.2.0-alpha.2. The statement reads a lot of data, so it is expected that some things are dropped due to constant limits defined in
tracing/tracer.go
. However, the bundle from 22.2.0 has too much stuff that is dropped. It appears as if we started dropping more stuff on 22.2 (I did check that things are propagated correctly when nothing is dropped on a smaller dataset). This can significantly complicate debugging some of the queries, so I'm tagging this as a GA-blocker until this issue is better understood.22.1.6.zip
22.2.0.zip
Jira issue: CRDB-19403
Epic CRDB-20796
The text was updated successfully, but these errors were encountered: