-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: split up large packages to speed up builds #79357
Comments
Here's a representative profile: local.gz.
The periods of low build parallelism (pink graph) correspond to |
The 5 packages you named match up to my experience as well, I routinely see them as bottlenecks. |
77337: spanconfig: limit # of tenant span configs r=irfansharif a=irfansharif Fixes #70555. In order to limit the number of span configs a tenant's able to install, we introduce a tenant-side spanconfig.Limiter. It presents the following interface: // Limiter is used to limit the number of span configs installed by // secondary tenants. It considers the committed and uncommitted // state of a table descriptor and computes the "span" delta, each // unit we can apply a configuration over. It uses these deltas to // maintain an aggregate counter, informing the caller if exceeding // the configured limit. type Limiter interface { ShouldLimit( ctx context.Context, txn *kv.Txn, committed, uncommitted catalog.TableDescriptor, ) (bool, error) } This limiter only applies to secondary tenants. The counter is maintained in a newly introduced (tenant-only) system table, using the following schema: CREATE TABLE system.span_count ( singleton BOOL DEFAULT TRUE, span_count INT NOT NULL, CONSTRAINT "primary" PRIMARY KEY (singleton), CONSTRAINT single_row CHECK (singleton), FAMILY "primary" (singleton, span_count) ); We need just two integration points for spanconfig.Limiter: - Right above CheckTwoVersionInvariant, where we're able to hook into the committed and to-be-committed descriptor state before txn commit. - In the GC job, when gc-ing table state. We decrement a table's split count when GC-ing the table for good. The per-tenant span config limit used is controlled by a new tenant read-only cluster setting: spanconfig.tenant_limit. Multi-tenant cluster settings (#73857) provides the infrastructure for the host tenant to be able to control this setting cluster wide, or to target a specific tenant at a time. We also need a migration here, to start tracking span counts for clusters with pre-existing tenants. We introduce a migration that scans over all table descriptors and seeds system.span_count with the right value. Given cluster version gates disseminate asynchronously, we also need a preliminary version to start tracking incremental changes. It's useful to introduce the notion of debt. This will be handy if/when we lower per-tenant limits, and also in the migration above since it's possible for pre-existing tenants to have committed state in violation of the prescribed limit. When in debt, schema changes that add new splits will be rejected (dropping tables/indexes/partitions/etc. will work just fine). When attempting a txn that goes over the configured limit, the UX is as follows: > CREATE TABLE db.t2(i INT PRIMARY KEY); pq: exceeded limit for number of table spans Release note: None Release justification: low risk, high benefit change 79462: colexecproj: break it down into two packages r=yuzefovich a=yuzefovich **colexecproj: split up default cmp proj op file into two** This commit splits up a single file containing two default comparison projection operators into two files. This is done in preparation of the following commit (which will move one of the operators to a different package). Release note: None **colexecproj: extract a new package for projection ops with const** This commit extracts a new `colexecprojconst` package out of `colexecproj` that contains all projection operators with one constant argument. This will allow for faster build speeds since both packages tens of thousands lines of code. Special care had to be taken for default comparison operator because we need to generate two files in different packages based on a single template. I followed the precedent of `sort_partitioner.eg.go` which had to do the same. Addresses: #79357. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
79485: sql: clean up some dependencies r=yuzefovich a=yuzefovich **row: move Metrics struct into rowinfra** This allows us to break dependency of `execinfra` on `row` which speeds up the build a bit. Release note: None **colexecbase: move a single struct to clean up dependencies** This commit moves `colexecbase.BinaryOverloadHelper` into `colexecutils` package in order to break the dependencies of `colexecproj` and `colexecprojconst` packages on `colexecbase`. This will speed up the build a bit. Addresses: #79357. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Here is an updated analysis (with #79938):
Packages of the vectorized engine are no longer on the critical path (at least in this run), woohoo! |
79938: execinfra: extract out some packages to break dependencies r=yuzefovich a=yuzefovich **execinfra: shrink the package down a bit** This commit refactors `execinfra` package by extracting some things in order to break the dependency of `colexecop` on it. Two main pieces had to be refactored: - `stats.go` file is now moved into `execstats` since it seems to be a much more suitable place. This required inlining `GetTraceData` function helper and breaking a dependency of `ShouldCollectStats` helper on `execinfra.FlowCtx`. - `operator.go` which defines `OpNode` interface (implemented by both columnar operators and row-by-row processors) has been moved (and renamed to `op_node.go`) to a new package `execopnode`. Release note: None **execinfra: extract Releasable interface into a separate package** This commit extracts `execinfra.Releasable` interface into a new `execreleasable` package in order to break the dependency of several packages of the vectorized engine on `execinfra`. This will allow those packages (e.g. `colexecproj`) to start being built sooner which will shorten the build time. Addresses: #79357. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This commit breaks the `memo` package's dependence on the `builtins` package so that they can be compiled concurrently. Optimizer packages are no longer on the critical path, for now. Informs cockroachdb#79357 Release note: None
Was curious so peeked into the future, running with #80745 + #80687 + #80773 + #80684 off of latest master:
Here's the profile: incremental.gz
|
#80684 will do that |
It doesn't, I don't think. I ran the above with the PR, which still has: cockroach/pkg/sql/parser/sql.y Line 33 in 80007da
|
I'm a little confused -- it doesn't look like |
Ah, good catch Ricky! Looking at the generated sql.go, I don't think we need it anymore. The roachpb dependency was pinned in the BUILD file cause it was used in the generated thing, we can just remove it. #80684 (review). |
This commit breaks the `memo` package's dependence on the `builtins` package so that they can be compiled concurrently. Optimizer packages are no longer on the critical path, for now. Informs cockroachdb#79357 Release note: None
80141: rowexec: use streamer for lookup joins with no ordering r=yuzefovich a=yuzefovich **row: fix the usage of Get requests with the streamer in some cases** Previously, `TxnKVStreamer` processed the Get responses under the assumption that `nil` `enqueueKeys` argument was provided. In such a scenario, `EnqueueKeysSatisfied` contained an ordinal into the original spans slice that is needed to process the Get response correctly. We will shortly expand the usage of the Streamer to use non-nil `enqueueKeys` argument which breaks that assumption. We go around it by exporting `Result.Position` value which tracks exactly the information we need. Release note: None **rowexec: use streamer for lookup joins with no ordering** This commit expands the usage of the Streamer to lookup joins when ordering doesn't have to be maintained. We can do so easily since the join reader span generators de-duplicate spans, so the Streamer will only be asked to process unique requests. In theory, we should be able to do the same for the cases when ordering needs to be maintained, but `InOrder` mode of the Streamer currently has known bugs / limitations when a single request can result in multiple rows - this wasn't an issue for the index joins since those are guaranteed to get a single row for each request. Also, we want to push the de-duplication logic from the join reader into the Streamer (in order to inform the behavior of the yet-to-be-added cache in the Streamer), but that will come after addressing the known `InOrder` bugs mentioned above. Some adjustments were needed: - In order to improve the behavior with low `workmem` limit (in particular, `regional_by_row` CCL logic test file was erroring out) of the join reader we now make the batch size hint at most 1/12 of the limit while giving the streamer the same three quarters - this allows for other in-memory state of the join reader itself to have some space. - One lookup join unit test is designed with the non-streamer code path in mind, so we disable the usage of the streamer there. - `lookup_join_trace` execbuilder file needed to filter out messages produced by the range cache. Out of curiosity, I decided to do a quick comparison of TPCH numbers with the Streamer `off` (first column) and `on` (second column) (negative numbers indicate the latency reduction, positive - latency increase): ``` Q1: 3.78s 3.28s -13.36% Q2: 0.19s 0.19s 2.11% Q3: 1.72s 1.48s -13.89% Q4: 1.57s 1.25s -20.54% Q5: 2.74s 2.49s -9.06% Q6: 6.36s 8.19s 28.74% Q7: 6.97s 7.35s 5.42% Q8: 1.00s 1.09s 8.68% Q9: 6.52s 6.93s 6.32% Q10: 1.93s 1.63s -15.60% Q11: 0.52s 0.57s 9.06% Q12: 7.16s 8.89s 24.17% Q13: 1.17s 1.13s -3.43% Q14: 0.57s 0.66s 13.91% Q15: 4.83s 5.53s 14.34% Q16: 0.81s 0.78s -4.08% Q17: 0.48s 0.48s 0.84% Q18: 5.61s 5.33s -4.96% Q19: 3.86s 3.41s -11.83% Q20: 12.00s 14.37s 19.79% Q21: 9.33s 7.24s -22.38% Q22: 0.57s 0.55s -4.03% ``` This mostly confirms our expectations - lookup joins should get faster since we now parallelize the lookups. Index joins already get parallel lookups in the old code path, so we have some regressions. Notably, Q6 and Q12 spend most of the time performing index joins and Q20 is dominated by an index join and a parallelizable (when lookup columns form a key) lookup joins. Addresses: #54680 Release note: None **sql: don't sort spans redundantly when using the streamer** This commit makes it so that the lookup and index joins don't sort the spans when the ordering doesn't have to be maintained and the Streamer API is used - the Streamer itself performs the sort of spans within single-range batches, so the caller's sort is redundant. Release note: None 80698: Added s390x support to Bazel's toolchain r=rickystewart a=dandotimujahid This PR is in reference to the issue #58676 It will enable s390x binaries to be built using Bazel's toolchain 80745: opt: break memo's dependence on builtins r=mgartner a=mgartner This commit breaks the `memo` packages dependence on the `builtins` package so that they can be compiled concurrently. Optimizer packages are no longer on the critical path, for now. Informs #79357 Release note: None 80778: dev: make top-level dev runnable from anywhere r=irfansharif a=irfansharif This should obviate the need to run dev from the top-level checkout; as long as it's in your path or have an alias to the script, you should be able to run dev wherever you are. It'll save you the need for the leading './' when invoking things. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: dandotimujahid <Mujahid.Dandoti@ibm.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
We're a good 10% faster for incremental builds with all the PRs above + #80684 (review) + Yahor's #80810. A bit more that 10% since this issue was originally filed. Thanks for all the work!
Here's the profile: incremental.gz As before, breaking down pkg/{sql,server,roachpb,builtins} in roughly that order of importance; roachpb, sql and server are the ones that actually limit parallelism given where they are in our dependency tree, the plateaus respectively in the screen shot above. All the analysis above has been focused on transitive dependencies from pkg/roachpb. Perhaps we should find a few other base packages to drive similar dependency hygiene improvements for CRDB. |
80842: parser: remove roachpb from tree r=irfansharif a=otan Reacting on an [earlier comment](#80684 (review)) which I misunderstood. Delete roachpb from parser as it is no longer a dependency. Refs: #79357 Release note: None Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
80810: colexecdisk: extract disk-backed operators into a new package r=yuzefovich a=yuzefovich **colexecdisk: extract disk-backed operators into a new package** This allows us to break dependency of `colexec` on `colexecjoin` which speeds up the build a bit. Addresses: #79357. Release note: None **execagg: extract out some aggregate helpers into a separate package** This allows us to break dependency of `colexecagg` on `execinfra`. Release note: None 80874: Add Andrew Baptist to AUTHORS r=andrewbaptist a=andrewbaptist Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
Manually synced with Jira |
We have marked this issue as stale because it has been inactive for |
Is your feature request related to a problem? Please describe.
This is a catch-all issue to split up our largest packages (as measured by build time) to increase build parallelism. Bazel can spit out how much time is spent compiling each package, and highlights what the bottleneck packages are.
Describe the solution you'd like
The following packages are common offendors. There are probably more.
pkg/sql
pkg/sql/parser
pkg/sql/colexec/colexecproj
Related: #60547, #78237.
Jira issue: CRDB-14743
The text was updated successfully, but these errors were encountered: