-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - sync2: add sqlstore #6405
Conversation
Given that after recent item sync is done (if it's needed at all), the range set reconciliation algorithm no longer depends on newly received item being added to the set, we can save memory by not adding the received items during reconciliation. During real sync, the received items will be sent to the respective handlers and after the corresponding data are fetched and validated, they will be added to the database, without the need to add them to cloned OrderedSets which are used to sync against particular peers.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6405 +/- ##
=========================================
- Coverage 79.9% 79.8% -0.1%
=========================================
Files 335 341 +6
Lines 43656 44204 +548
=========================================
+ Hits 34892 35296 +404
- Misses 6800 6915 +115
- Partials 1964 1993 +29 ☔ View full report in Codecov by Sentry. |
This adds multi-peer synchronization support. When the local set differs too much from the remote sets, "torrent-style" "split sync" is attempted which splits the set into subranges and syncs each sub-range against a separate peer. Otherwise, the full sync is done, syncing the whole set against each of the synchronization peers. Full sync is also done after each split sync run. The local set can be considered synchronized after the specified number of full syncs has happened. The approach is loosely based on [SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains](https://people.bu.edu/staro/2023-ICBC-Novak.pdf) paper by Novak Boškov, Sevval Simsek, Ari Trachtenberg, and David Starobinski.
4c57c22
to
ef30f47
Compare
The `sqlstore` package provides simple sequence-based interface to the tables being synchronized. It is used by the FPTree data structure as the database layer, and doesn't do range fingerprinting by itself. `SyncedTable` and `SyncedTableSnapshot` provide methods that wrap the necessary SQL operations. `sql/expr` package was added to facilitate SQL generation.
f886141
to
bb43161
Compare
sync2/sqlstore/dbseq.go
Outdated
type dbIDKey struct { | ||
//nolint:unused | ||
id string | ||
//nolint:unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these linter annotations still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the linter appears to be silly enough to mark fields which are never accessed directly (like foo.bar
) as unused, even though it's often useful to have such fields in structs serving as map keys etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of structs as keys and nolint
indicates an issue (although not necessarily the correct one):
Is chunkSize
really part of the key and not part of the value object? If I look up the same ID with a different chunkSize
should I expect to not find it in the cache or should I find it and then evaluate the chunkSize
to be the one I expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if chunkSize
can change dynamically doesn't this lead to the same values (by ID) being stored multiple times in the cache with different chunkSizes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit more benchmarking, I've removed the LRU cache altogether. It was more useful with an earlier FPTree iteration that did not do node-aligned range splits, but now it just doesn't improve performance at all
sync2/sqlstore/dbseq.go
Outdated
// LRU cache for ID chunks. | ||
type lru = simplelru.LRU[dbIDKey, []rangesync.KeyBytes] | ||
|
||
const lruCacheSize = 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to have a comment here as to how much memory this would translate into in the LRU cache in total when it's full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's somewhat hard to say exactly what will be size of the cache as it stores chunks of different size.
I'll add a limit on the size of each cached chunk and will make cache size configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the LRU cache (see above)
// if the chunk size was reduced due to a short chunk before wraparound, we need | ||
// to extend it back | ||
if cap(s.chunk) < s.chunkSize { | ||
s.chunk = make([]rangesync.KeyBytes, s.chunkSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this is actually needed. Unless you're passing the actual slice somehow, there's no real need of allocating again. You can just over-allocate at the first time, and use the clear()
builtin over it to just reuse that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed as chunk size is dynamic. As the SQL iterator progresses, it loads larger and larger chunks by means of applying larger LIMIT
values. This is due to usage pattern of FPTree
which may request just a few items from some iterators and a lot of items from others.
The comment was somewhat misplaced b/c it applies to the case where capacity is enough.
We still need to reallocate the chunk if chunkSize
grew. Allocating max-sized chunks from the start for each iterator created might be wasteful.
sync2/sqlstore/dbseq.go
Outdated
s.chunkSize = min(s.chunkSize*2, s.maxChunkSize) | ||
switch { | ||
case err != nil || ierr != nil: | ||
return errors.Join(ierr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning both errors doesn't make sense to me, I think ierr
should take precedence since the db request was interrupted because of it and err
isn't interesting in this case, i.e.
case ierr != nil:
return ierr
case err != nil:
return err
case n == 0:
// unchanged
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
It didn't add much to efficiency
Re-generating SQL statements every time is quick but increases GC pressure.
bors merge |
------- ## Motivation The database-aware sync implementation is rather complex and splitting it in parts might help the review process.
Pull request successfully merged into develop. Build succeeded: |
Motivation
The database-aware sync implementation is rather complex and splitting
it in parts might help the review process.
Description
The
sqlstore
package provides simple sequence-based interface to thetables being synchronized. It is used by the FPTree data structure as
the database layer, and doesn't do range fingerprinting by itself.
SyncedTable
andSyncedTableSnapshot
provide methods that wrap thenecessary SQL operations.
sql/expr
package was added to facilitateSQL generation.
The
sql/expr
package is also used by Bloom filters (#6332), whichare to be re-benchmarked and updated later. The idea is to use this
simple AST-based SQL generator in other places in the code where we're
generating SQL. The rqlite dependency which is introduced is also to
be used to convert schema SQL to a "canonical" form during schema
drift detection, and to filter out comments properly in the migration
scripts.
#6358 needs to be merged before this one.