Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: timeless ordering should be RowId-based #1807

Closed
Tracked by #1898
teh-cmc opened this issue Apr 10, 2023 · 0 comments · Fixed by #4395
Closed
Tracked by #1898

datastore: timeless ordering should be RowId-based #1807

teh-cmc opened this issue Apr 10, 2023 · 0 comments · Fixed by #4395
Assignees
Labels
⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Apr 10, 2023

With the new GC landing in #1801, garbage collection is now deterministic and based on RowId order, i.e. the clients' wall-clocks.
But our timeless tables are still using insertion order, which is both inconsistent and meaningless.

Our timeless tables should be sorted based on RowIds too.

@teh-cmc teh-cmc added the ⛃ re_datastore affects the datastore itself label Apr 10, 2023
@teh-cmc teh-cmc self-assigned this Nov 27, 2023
teh-cmc added a commit that referenced this issue Dec 2, 2023
Fixes a long-standing bug: timeless tables not being sorted by `RowId`,
which means they effectively always return incorrect results for
out-of-order data (yes, that is a thing even in a timeless context).

This _worsens_  GC performance for timeless tables, but:
1. The performance of incorrect code hardly matters to begin with, and
2. this is ground work for turning timeless tables in ringbuffers in an
upcoming PR, which will massively improve performance.

- Fixes #1807 

### Benchmarks

Hint: it's even worse!

```
group                                                     gc_improvements_0                       gc_improvements_1
-----                                                     -----------------                       -----------------
.../plotting_dashboard/drop_at_least=0.3/bucketsz=1024    1.00   1084.0±4.47ms 54.1 KElem/sec     1.03   1117.2±9.07ms 52.4 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=2048    1.00       2.1±0.02s 27.6 KElem/sec     1.01       2.1±0.01s 27.3 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=256     1.00    465.8±2.50ms 125.8 KElem/sec    1.01    471.5±4.76ms 124.3 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=512     1.00    655.3±2.61ms 89.4 KElem/sec     1.02    666.7±6.64ms 87.9 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/default          1.00    652.8±4.12ms 89.8 KElem/sec     1.02    665.6±4.67ms 88.0 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=1024         1.00       2.4±0.05s 24.2 KElem/sec     3.35       8.1±0.10s  7.2 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=2048         1.00       2.4±0.03s 24.1 KElem/sec     3.30       8.0±0.09s  7.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=256          1.00       2.5±0.08s 23.5 KElem/sec     3.23       8.1±0.11s  7.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=512          1.00       2.4±0.02s 24.5 KElem/sec     3.38       8.1±0.11s  7.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/default               1.00       2.4±0.03s 24.4 KElem/sec     3.35       8.1±0.07s  7.3 KElem/sec

```

---

Part of the GC improvements series:
- #4394
- #4395
- #4396
- #4397
- #4398
- #4399
- #4400
- #4401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant