Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sqlccl: rework split+scatter+import concurrency in RESTORE
This was largely motivated by how long it takes to presplit in very large restores. Doing them all upfront required rate-limiting, but no number was well-tuned for every cluster. Additionally, even at a high presplit rate of 100, a 10 TB cluster would take 52 minutes before it even started scattering. Now, one goroutine iterates through every span being imported, presplitting and scattering before moving on to the next one. Upon split+scatter, the span is sent into a buffered channel read by the Import goroutines, which prevents it from getting too far ahead of the Imports. This both acts as a natural rate limiter for the splits as well as bounds the number of empty ranges created if a RESTORE fails or is cancelled. Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster. Since each range is now scattered individually, we no longer need the jitter in the scatter implementation (plus it now slows down the RESTORE), so it's removed. Restore really needs a refactor, but I'm going to be making a couple more changes leading up to 1.1 so I'll leave cleanup until after they go in. This removes most tunable constants in RESTORE and the remaining ones are defined in terms of the number of nodes in the cluster and the number of cpus on a node, so this: Closes #14798.
- Loading branch information