Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dnm] tracing: investigate perf regression under sql.txn_stats.sample_rate=1 #61328

Closed
wants to merge 17 commits into from

Commits on Mar 4, 2021

  1. kv: permit merge transactions to refresh after subsumption

    Fixes cockroachdb#59308.
    
    This commit adds support for range merge transactions to refresh. This
    is necessary to allow the merge transaction to have its write timestamp
    be bumped and still commit without retrying. In such cases, the
    transaction must refresh its reads, including its original read on the
    RHS's local range descriptor. If we were to block this refresh on the
    frozen RHS range, the merge would deadlock.
    
    On the surface, it seems unsafe to permit Refresh requests on an already
    subsumed RHS range, because the refresh's effect on the timestamp cache
    will never make it to the LHS leaseholder. This risks the future joint
    range serving a write that invalidates the Refresh. However, in this
    specific situation, we can be sure that such a serializability violation
    will not occur because the Range merge also writes to (deletes) this
    key. This means that if the Range merge transaction commits, its intent
    on the key will be resolved to the timestamp of the refresh and no
    future write will ever be able to violate the refresh. Conversely, if the
    Range merge transaction does not commit, then the merge will fail and
    the update to the RHS's timestamp cache will not be lost (not that this
    particularly matters in cases of aborted transactions).
    
    The same line of reasoning as the one above has motivated us to explore
    removing keys from a transaction's refresh spans when they are written
    to by the transaction, as the intents written by the transaction act as
    a form of pessimistic lock that obviate the need for the optimistic
    refresh. Such an improvement would eliminate the need for this special
    case, but until we generalize the mechanism to prune refresh spans based
    on intent spans, we're forced to live with this.
    
    See cockroachdb#59308 (comment)
    for why the original fix, which attempted to manually refresh the range
    merge transaction before it entered its critical phase, was not sufficient.
    
    Release justification: needed for new functionality.
    nvanbenschoten authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    7d22d00 View commit details
    Browse the repository at this point in the history
  2. streamclient: add client for sinkless CREATE REPLICATOIN STREAM

    This commit adds support for a stream client that can connect to a
    cluster and start replicating changes that happen on that stream.
    
    Release note: None
    
    Release justification: low-risk, high-benefit change
    pbardea authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    42cbdee View commit details
    Browse the repository at this point in the history
  3. sql: disallow creation of interleaved partitioned indexes

    Previously, creating interleaved partitioned indexes panicked. This
    commit disallows their creation to prevent panicking.
    
    Fixes cockroachdb#60699
    
    Release justification: This is a low risk change that prevents panics
    when attempting to create interleaved partitioned tables.
    
    Release note (bug fix): Creating interleaved partitioned indexes is now
    disallowed. Previously, the database would crash when trying to create
    one.
    mgartner authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    1118904 View commit details
    Browse the repository at this point in the history
  4. execinfra: clean up ProcessorBase

    This commit removes redundant context argument from
    `trailingMetaCallback` in favor of just using `ProcessorBase.Ctx` when
    needed. This was already actually the case since the callback was always
    called with `ProcessorBase.Ctx`. We also had the wrong sequences of
    actions when collecting trailing metadata in inverted joiner, join
    reader, and zigzag joiner, where we first called `InternalClose` and
    then generated the metadata. That is wrong because `InternalClose`
    replaces the context used throughout the operation of a processor with
    the "original" context (the one passed into `StartInternal`). This is
    now fixed.
    
    Additionally, a few unnecessary method overwrites are removed and
    `ProcessorBase` now provides the default implementation of
    `ConsumerClosed` (which simply calls `InternalClose`).
    
    This commit also makes a slight adjustment to context management of the
    change aggregator.
    
    Release justification: low-risk cleanup.
    
    Release note: None
    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    ba35f1b View commit details
    Browse the repository at this point in the history
  5. server: remove unnecessary % escape

    as of 1982047 this log line is now formatted with Go's
    templating system, which doesn't require that % is escaped.
    
    Before:
    
    ```
    I210303 09:56:17.459802 311 2@server/status/runtime.go:553 ⋮ [n1] 76
    runtime stats: 106 MiB RSS, 261 goroutines (stacks: 5.0 MiB), 29
    MiB/63 MiB Go alloc/total (heap fragmentation: 6.8 MiB, heap reserved:
    11 MiB, heap released: 12 MiB), 7.6 MiB/13 MiB CGO alloc/total (0.6
    CGO/sec), 1.8/1.2 %%(u/s)time, 0.0 %%gc (0x ), 17 KiB/22 KiB (r/w)net
    ```
    
    After:
    
    ```
    I210303 09:59:17.607021 59 2@server/status/runtime.go:553 ⋮ [n1] 71
    runtime stats: 92 MiB RSS, 255 goroutines (stacks: 3.7 MiB), 27 MiB/48
    MiB Go alloc/total (heap fragmentation: 4.8 MiB, heap reserved: 1.3
    MiB, heap released: 27 MiB), 10 MiB/15 MiB CGO alloc/total (0.0
    CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (0x), 30 KiB/34 KiB (r/w)net
    ```
    
    Release justification: Negligible risk.
    Release note: None
    stevendanna authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    f85ff2a View commit details
    Browse the repository at this point in the history
  6. closedts/sidetransport: fix goroutine leak

    Fix a goroutine leak caused by gRPC streams. It turns out that
    receiving an error from an RPC's stream is not enough to cleanup the
    stream, you need to do something else as described here
    https://pkg.go.dev/google.golang.org/grpc#ClientConn.NewStream
    
    This patch starts canceling the stream's context whenever the stream is
    discarded.
    
    Release note: None
    Release justification: Bug fix for new functionality.
    andreimatei authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    050241d View commit details
    Browse the repository at this point in the history
  7. colflow: clean up vectorized stats propagation

    Previously, in order to propagate execution statistics we were creating
    temporary tracing spans, setting the stats on them, and finishing the
    spans right away. This allowed for using (to be more precise, abusing)
    the existing infrastructure. The root of the problem is that in the
    vectorized engine we do not start per-operator span if stats collection
    is enabled at the moment, so we had to get around that limitation.
    
    However, this way is not how tracing spans are intended to be used and
    creates some performance slowdown in the hot path, so this commit
    refactors the situation. Now we are ensuring that there is always
    a tracing span available at the "root" components (either root
    materializer or an outbox), so when root components are finishing the
    vectorized stats collectors for their subtree of operators, there is
    a span to record the stats into.
    
    This required the following minor adjustments:
    - in the materializer, we now delegate attachment of the stats to the
    tracing span to the drainHelper (which does so on `ConsumerDone`). Note
    that the drainHelper doesn't get the recording from the span and leaves
    that to the materializer (this is needed in order to avoid collecting
    duplicate trace data).
    - in the outbox, we now start a "remote child span" (if there is a span
    in the parent context) in the beginning of `Run` method, and we attach
    that stats in `sendMetadata`.
    
    Release justification: low-risk update to existing functionality.
    
    Release note: None
    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    ec7d1d6 View commit details
    Browse the repository at this point in the history
  8. [DNM] adjust KV benchmark to introduce sample rate

    One could run the relevant benchmark with
    `make bench PKG=./pkg/sql/tests BENCHES=BenchmarkKV//SQL`.
    
    Release note: None
    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    d94e463 View commit details
    Browse the repository at this point in the history
  9. WIP remove NodeLevelStats

    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    7d843f3 View commit details
    Browse the repository at this point in the history
  10. WIP pool ComponentStats objects

    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    f273b30 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e31d115 View commit details
    Browse the repository at this point in the history
  12. sql: use stmt's span for exec stats propagation

    Release justification: low-risk update to new functionality.
    
    Release note: None
    yuzefovich authored and tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    95970ec View commit details
    Browse the repository at this point in the history
  13. metric: break dependency on 'log' package

    Release justification: low-risk change and prerequisite for metrics
    improvement.
    Release note: None
    tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    bf5d4a4 View commit details
    Browse the repository at this point in the history
  14. tracing: add metrics

    Release justification: low-risk, high benefit change to existing
    functionality
    Release note (ops change): Added the `trace.spans_created.count` and
    `trace.registry.size` metrics that account for the number of created
    trace spans and tracked local root spans, respectively.
    tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    8b7fd22 View commit details
    Browse the repository at this point in the history
  15. Run benchmark without GC

    ```
    $ ./cmp.sh
    name                                 old time/op  new time/op  delta
    KV/Scan/SQL/rows=1/sample_rate=X-16   267µs ± 9%   313µs ± 8%  +17.32%  (p=0.000 n=22+22)
    ```
    
    Release justification:
    Release note (<category, see below>): <what> <show> <why>
    tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    4953761 View commit details
    Browse the repository at this point in the history
  16. tracing: add synthetic SpanStats benchmark

    ```
    $ go test -run - -bench SpanStats -count 1 ./pkg/util/tracing
    goos: linux
    goarch: amd64
    pkg: github.com/cockroachdb/cockroach/pkg/util/tracing
    BenchmarkSpanStats/autoCollection=false,stats=0-16         	 280952	     4326 ns/op
    BenchmarkSpanStats/autoCollection=false,stats=1-16         	 158040	     7901 ns/op
    
    BenchmarkSpanStats/autoCollection=true,stats=0-16          	 471117	     2631 ns/op
    BenchmarkSpanStats/autoCollection=true,stats=1-16          	 242348	     5919 ns/op
    PASS
    ok  	github.com/cockroachdb/cockroach/pkg/util/tracing	8.414s
    ```
    
    The case `autoCollection=false,stats=1` should roughly reflect the SQL
    usage in which stats are created, transported via a recording, ingested
    into a parent, and then retrieved and unmarshaled. The case `false,stats=0`
    should correspond to the non-sampled case. We are thus expecting roughly
    an overhead of 4.3µs/op when not sampled, 8µs/op when sampled, i.e.
    changing the sampling rate from 0 to 1 should add ~4µs/op.
    However, the macro-level benchmark shows much more:
    
    ```
    KV/Scan/SQL/rows=1/sample_rate=X-16   268µs ± 9%   314µs ± 7%  +17.03%  (p=0.000 n=21+21)
    ```
    
    That's a drop of 46µs incurred by changing the sampling rate from
    zero to one, much past what can be explained by this microbenchmark.
    
    The autoCollection=true numbers avoid an extra unmarshal-remarshal cycle
    of the recording. This is a possible optimization for SQL to investigate
    in the case in which the flow lives wholly on the gateway.
    Unfortunately, it is finicky to get right and causes lots of special
    casing, as SQL currently relies on all trace data arriving at the
    DistSQL RowReceiver via TrailingMeta (i.e. marshalled recordings).
    
    Release justification:
    Release note (<category, see below>): <what> <show> <why>
    tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    68eceb0 View commit details
    Browse the repository at this point in the history
  17. add spans/op measurement to BenchmarkKV

    This shows that in this benchmark, we're creating one additional span: 7
    without sampling, 8 with sampling.
    
    This one additional trace span is unlikely to cost ~40us, but avoiding
    it would establish more parity between the tests. Can we?
    tbg committed Mar 4, 2021
    Configuration menu
    Copy the full SHA
    a0e2cf8 View commit details
    Browse the repository at this point in the history