Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: speed up ydb insert #1793

Merged
merged 1 commit into from
Nov 1, 2023
Merged

fix: speed up ydb insert #1793

merged 1 commit into from
Nov 1, 2023

Conversation

ischasny
Copy link
Collaborator

@ischasny ischasny commented Oct 31, 2023

  • speed up ydb indexing by executing inserts in batches instead of concurrently
  • add insert batch size database setting that is default to 10k

Performance results

Car file used: {"piececid": "bafykbzacedrrhwwbp7bcbvekiqzlzvbi65wy2ku6xkejlewz6bciasnnqycd4", "recs": 2232145}

  • The current implementation
    2023-11-01T13:06:20.289Z WARN boostd-data cmd/run.go:339 added index to yugabytedb successfully {"took": 244.186086292}
  • Batch inserts 1k batch size
    2023-11-01T12:51:23.415Z WARN boostd-data cmd/run.go:348 added index to yugabytedb successfully {"took": 47.338054792}
  • Batch inserts 10k batch size
    2023-11-01T12:53:26.419Z WARN boostd-data cmd/run.go:348 added index to yugabytedb successfully {"took": 40.236502708}
  • Batch inserts 50k batch size
    2023-11-01T12:52:23.827Z WARN boostd-data cmd/run.go:348 added index to yugabytedb successfully {"took": 35.621975375}

extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
extern/boostd-data/yugabyte/service.go Outdated Show resolved Hide resolved
* speed up ydb indexing by executing inserts in batches instead of concurrently
* add insert batch size database setting that is default to 10k
@ischasny ischasny merged commit 350adad into main Nov 1, 2023
21 checks passed
@ischasny ischasny deleted the ivan/speedup-ydb-insert branch November 1, 2023 13:52
for allIdx, rec := range recs {
if batch == nil {
batch = s.session.NewBatch(gocql.UnloggedBatch).WithContext(ctx)
batch.Entries = make([]gocql.BatchEntry, 0, s.settings.InsertBatchSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size should be whatever is left, which is at most insertBatchSize, not always.

})
batch.Entries = append(batch.Entries, gocql.BatchEntry{
Stmt: insertPieceOffsetsQry,
Args: []interface{}{trimMultihash(rec.Cid.Hash()), pieceCidBytes},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Args: []interface{}{trimMultihash(rec.Cid.Hash()), pieceCidBytes},
Args: []any{trimMultihash(rec.Cid.Hash()), pieceCidBytes},

if batch == nil {
batch = s.session.NewBatch(gocql.UnloggedBatch).WithContext(ctx)
batch.Entries = make([]gocql.BatchEntry, 0, s.settings.InsertBatchSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re slice size.

Would be nice to reduce duplicate code in working with batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants