-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC improvements 3: turn datastore columns into ring-buffers #4397
Conversation
309a3d4
to
c41aebb
Compare
2711521
to
900bb88
Compare
900bb88
to
4d7f98c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
how do you create these neat side by side comparision tables?
Pure magic how these prs fit into slot into each other :D
times.make_contiguous(); | ||
let (times, &mut []) = times.as_mut_slices() else { | ||
unreachable!(); | ||
}; | ||
times.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you wrote that before you decided to put a sort
into VecDequeSortingExt
or why else can't it use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 nice catch
de1c851
to
a3f5046
Compare
I use $ git checkout branch1
$ taskset -c 7 cargo bench -p re_arrow_store --bench gc -- --save-baseline branch1
$ git checkout branch1
$ taskset -c 7 cargo bench -p re_arrow_store --bench gc -- --save-baseline branch2
$ critcmp branch1 branch2 |
4d7f98c
to
43b28b6
Compare
Introduce 2 new benchmark suites that drive the development of this PR series: 1. Logging a tons of scalars, in order, across a bunch of series, themselves scattered across a bunch of plots. 2. Logging a tons of timeless data, across a bunch of entities. ### Benchmarks Hint: it's bad. ``` .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.00 1084.0±4.47ms 54.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.00 2.1±0.02s 27.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.00 465.8±2.50ms 125.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.00 655.3±2.61ms 89.4 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.00 652.8±4.12ms 89.8 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 2.4±0.05s 24.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 2.4±0.03s 24.1 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 2.5±0.08s 23.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 2.4±0.02s 24.5 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 2.4±0.03s 24.4 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Fixes a long-standing bug: timeless tables not being sorted by `RowId`, which means they effectively always return incorrect results for out-of-order data (yes, that is a thing even in a timeless context). This _worsens_ GC performance for timeless tables, but: 1. The performance of incorrect code hardly matters to begin with, and 2. this is ground work for turning timeless tables in ringbuffers in an upcoming PR, which will massively improve performance. - Fixes #1807 ### Benchmarks Hint: it's even worse! ``` group gc_improvements_0 gc_improvements_1 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.00 1084.0±4.47ms 54.1 KElem/sec 1.03 1117.2±9.07ms 52.4 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.00 2.1±0.02s 27.6 KElem/sec 1.01 2.1±0.01s 27.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.00 465.8±2.50ms 125.8 KElem/sec 1.01 471.5±4.76ms 124.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.00 655.3±2.61ms 89.4 KElem/sec 1.02 666.7±6.64ms 87.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.00 652.8±4.12ms 89.8 KElem/sec 1.02 665.6±4.67ms 88.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 2.4±0.05s 24.2 KElem/sec 3.35 8.1±0.10s 7.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 2.4±0.03s 24.1 KElem/sec 3.30 8.0±0.09s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 2.5±0.08s 23.5 KElem/sec 3.23 8.1±0.11s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 2.4±0.02s 24.5 KElem/sec 3.38 8.1±0.11s 7.3 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 2.4±0.03s 24.4 KElem/sec 3.35 8.1±0.07s 7.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
a3f5046
to
552fe04
Compare
4eb980b
to
2fc3627
Compare
Indexes `EntityPathHash`es alongside `TimePoint`s in the metadata registry to avoid having to run fullscans during garbage collection. Yields some more significant wins in the common case. ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_4 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 10.32 1084.0±4.47ms 54.1 KElem/sec 1.00 105.0±0.91ms 558.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 19.80 2.1±0.02s 27.6 KElem/sec 1.00 107.3±0.83ms 546.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 4.38 465.8±2.50ms 125.8 KElem/sec 1.00 106.3±0.74ms 551.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 6.16 655.3±2.61ms 89.4 KElem/sec 1.00 106.4±0.94ms 550.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 6.34 652.8±4.12ms 89.8 KElem/sec 1.00 102.9±0.75ms 569.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 37.12 2.4±0.05s 24.2 KElem/sec 1.00 65.3±0.81ms 897.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 37.54 2.4±0.03s 24.1 KElem/sec 1.00 64.9±1.07ms 903.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 38.81 2.5±0.08s 23.5 KElem/sec 1.00 64.4±0.99ms 910.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 37.00 2.4±0.02s 24.5 KElem/sec 1.00 64.6±1.08ms 906.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 36.82 2.4±0.03s 24.4 KElem/sec 1.00 65.3±1.29ms 897.3 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_3 gc_improvements_4 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 2.30 241.0±1.66ms 243.1 KElem/sec 1.00 105.0±0.91ms 558.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 2.24 239.9±2.70ms 244.3 KElem/sec 1.00 107.3±0.83ms 546.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 2.33 247.4±3.94ms 236.8 KElem/sec 1.00 106.3±0.74ms 551.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.27 241.2±2.06ms 243.0 KElem/sec 1.00 106.4±0.94ms 550.6 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.33 239.6±1.98ms 244.6 KElem/sec 1.00 102.9±0.75ms 569.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 60.3±1.16ms 972.3 KElem/sec 1.08 65.3±0.81ms 897.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 60.8±1.14ms 964.3 KElem/sec 1.07 64.9±1.07ms 903.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 61.0±1.99ms 960.9 KElem/sec 1.06 64.4±0.99ms 910.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 60.6±1.45ms 966.9 KElem/sec 1.07 64.6±1.08ms 906.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 57.6±0.35ms 1018.1 KElem/sec 1.13 65.3±1.29ms 897.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Optimize the creation of `StoreDiff`s and `StoreEvent`s, which turns out to be a major cost in time series use cases, when it is common to generate several millions of those on any single GC run. Once again some pretty significant wins. ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_5 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 13.00 1084.0±4.47ms 54.1 KElem/sec 1.00 83.4±1.16ms 702.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 25.37 2.1±0.02s 27.6 KElem/sec 1.00 83.7±0.61ms 700.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 5.55 465.8±2.50ms 125.8 KElem/sec 1.00 84.0±0.50ms 697.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 7.94 655.3±2.61ms 89.4 KElem/sec 1.00 82.5±1.33ms 710.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 8.02 652.8±4.12ms 89.8 KElem/sec 1.00 81.4±0.94ms 720.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 35.87 2.4±0.05s 24.2 KElem/sec 1.00 67.5±2.21ms 867.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 35.91 2.4±0.03s 24.1 KElem/sec 1.00 67.8±1.86ms 863.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 37.02 2.5±0.08s 23.5 KElem/sec 1.00 67.5±1.43ms 868.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 35.47 2.4±0.02s 24.5 KElem/sec 1.00 67.4±1.40ms 869.4 KElem/sec .../timeless_logs/drop_at_least=0.3/default 36.00 2.4±0.03s 24.4 KElem/sec 1.00 66.8±0.85ms 877.3 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_4 gc_improvements_5 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.26 105.0±0.91ms 558.1 KElem/sec 1.00 83.4±1.16ms 702.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.28 107.3±0.83ms 546.2 KElem/sec 1.00 83.7±0.61ms 700.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.27 106.3±0.74ms 551.3 KElem/sec 1.00 84.0±0.50ms 697.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.29 106.4±0.94ms 550.6 KElem/sec 1.00 82.5±1.33ms 710.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 1.26 102.9±0.75ms 569.4 KElem/sec 1.00 81.4±0.94ms 720.0 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.00 65.3±0.81ms 897.6 KElem/sec 1.03 67.5±2.21ms 867.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 64.9±1.07ms 903.2 KElem/sec 1.05 67.8±1.86ms 863.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 64.4±0.99ms 910.2 KElem/sec 1.05 67.5±1.43ms 868.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 64.6±1.08ms 906.9 KElem/sec 1.04 67.4±1.40ms 869.4 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 65.3±1.29ms 897.3 KElem/sec 1.02 66.8±0.85ms 877.3 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Makes the GC capable of dropping entire buckets in one go when the conditions are met (and they are pretty simple to meet in the common case of in-order data). Unfortunately, I couldn't make the batched GC match -- let alone improve -- the performance of the standard GC. I even have a branch with a parallel batched GC, and it's still slower: the overhead of the batching datastructures just kills me everytime. For that reason, batching is disabled by default. I still want to commit the code so as to prevent it from rotting though, so we can come back to it at a later time. This introduces a slight performance deterioration on the non-batched path, that's fine. ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_6 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/default 7.62 652.8±4.12ms 89.8 KElem/sec 1.00 85.7±1.14ms 683.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 5.34 465.8±2.50ms 125.8 KElem/sec 1.00 87.2±0.55ms 671.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 7.12 655.3±2.61ms 89.4 KElem/sec 1.00 92.0±1.85ms 636.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 12.45 1084.0±4.47ms 54.1 KElem/sec 1.00 87.1±0.40ms 672.7 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 23.63 2.1±0.02s 27.6 KElem/sec 1.00 89.9±1.13ms 652.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256/gc_batching=true 1.00 98.6±0.82ms 594.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512/gc_batching=true 1.00 94.6±1.26ms 619.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024/gc_batching=true 1.00 93.2±1.29ms 628.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048/gc_batching=true 1.00 94.3±1.43ms 621.1 KElem/sec .../timeless_logs/drop_at_least=0.3/default 33.30 2.4±0.03s 24.4 KElem/sec 1.00 72.2±2.46ms 811.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 35.16 2.5±0.08s 23.5 KElem/sec 1.00 71.1±2.31ms 824.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 35.08 2.4±0.02s 24.5 KElem/sec 1.00 68.1±1.20ms 859.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 36.86 2.4±0.05s 24.2 KElem/sec 1.00 65.7±0.87ms 891.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 35.99 2.4±0.03s 24.1 KElem/sec 1.00 67.7±1.33ms 865.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256/gc_batching=true 1.00 68.7±1.40ms 853.1 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512/gc_batching=true 1.00 67.3±0.32ms 870.8 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024/gc_batching=true 1.00 67.7±1.21ms 865.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048/gc_batching=true 1.00 67.6±1.31ms 866.6 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_5 gc_improvements_6 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/default 1.00 81.4±0.94ms 720.0 KElem/sec 1.05 85.7±1.14ms 683.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.00 84.0±0.50ms 697.8 KElem/sec 1.04 87.2±0.55ms 671.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 1.00 82.5±1.33ms 710.0 KElem/sec 1.11 92.0±1.85ms 636.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 1.00 83.4±1.16ms 702.9 KElem/sec 1.04 87.1±0.40ms 672.7 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 1.00 83.7±0.61ms 700.0 KElem/sec 1.07 89.9±1.13ms 652.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256/gc_batching=true 1.00 98.6±0.82ms 594.2 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512/gc_batching=true 1.00 94.6±1.26ms 619.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024/gc_batching=true 1.00 93.2±1.29ms 628.9 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048/gc_batching=true 1.00 94.3±1.43ms 621.1 KElem/sec .../timeless_logs/drop_at_least=0.3/default 1.00 66.8±0.85ms 877.3 KElem/sec 1.08 72.2±2.46ms 811.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 1.00 67.5±1.43ms 868.2 KElem/sec 1.05 71.1±2.31ms 824.5 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 1.00 67.4±1.40ms 869.4 KElem/sec 1.01 68.1±1.20ms 859.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 1.03 67.5±2.21ms 867.5 KElem/sec 1.00 65.7±0.87ms 891.4 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 1.00 67.8±1.86ms 863.9 KElem/sec 1.00 67.7±1.33ms 865.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024/gc_batching=true 1.00 67.7±1.21ms 865.2 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048/gc_batching=true 1.00 67.6±1.31ms 866.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256/gc_batching=true 1.00 68.7±1.40ms 853.1 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512/gc_batching=true 1.00 67.3±0.32ms 870.8 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Adds a configurable time bound to the GC, in addition to the pre-existing space bound. ```rust /// How long the garbage collection in allowed to run for. /// /// Trades off latency for throughput: /// - A smaller `time_budget` will clear less data in a shorter amount of time, allowing for a /// more responsive UI at the cost of more GC overhead and more frequent runs. /// - A larger `time_budget` will clear more data in a longer amount of time, increasing the /// chance of UI freeze frames but decreasing GC overhead and running less often. /// /// The default is an unbounded time budget (i.e. throughput only). pub time_budget: Duration, ``` No time budget: https://github.com/rerun-io/rerun/assets/2910679/8ca63aa3-5ad4-4575-9486-21d805026c1e 3.5ms budget: https://github.com/rerun-io/rerun/assets/2910679/e1bd1a41-6353-4a0e-90e5-8c05b76e92ea --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
This turns every single column in
DataStore
/DataTable
into a ringbuffer (VecDeque
).This means that on the common/happy path of data being ingested in order:
This leads to very significant performance improvements on the common path.
Benchmarks
Compared to
main
:Compared to previous PR:
Part of the GC improvements series:
RowId
-ordered #4395VecDeque
extensions & benchmarks #4396EntityPathHash
es in metadata registry #4398Store{Diff,Event}
optimizations #4399time_budget
GC setting #4401Checklist