[Badger] Add universal database operations #6465

zhangchiqing · 2024-09-14T00:53:21Z

This PR implemented the low level database operations that can be shared by both badger and pebble implementation.

Since badger operations will be refactored to use batch updates instead of transactions, this makes the badger db operations and pebble db operations very similar and I managed to unify them.

A separate PR that refactors the approvals will demonstrate how the unified database operations can be used there. (see #6466)

This allows us to remove both the badger-specific and pebble-specific database modules, and only keep one universal version, easier to maintain.

codecov-commenter · 2024-09-14T00:56:48Z

Codecov Report

Attention: Patch coverage is 30.39216% with 355 lines in your changes missing coverage. Please review.

Project coverage is 40.96%. Comparing base (0310dfd) to head (24d65bc).

Files with missing lines	Patch %	Lines
storage/operation/pebbleimpl/writer.go	0.00%	51 Missing ⚠️
storage/operation/badgerimpl/writer.go	0.00%	48 Missing ⚠️
storage/operation/badgerimpl/iterator.go	0.00%	47 Missing ⚠️
storage/operation/reads.go	69.16%	25 Missing and 12 partials ⚠️
storage/operation/dbtest/helper.go	0.00%	31 Missing ⚠️
storage/operation/pebbleimpl/iterator.go	0.00%	31 Missing ⚠️
utils/unittest/unittest.go	0.00%	27 Missing ⚠️
storage/operation/badgerimpl/reader.go	0.00%	22 Missing ⚠️
storage/operation/codec.go	27.27%	16 Missing ⚠️
storage/operation/pebbleimpl/reader.go	0.00%	16 Missing ⚠️
... and 5 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6465      +/-   ##
==========================================
- Coverage   40.98%   40.96%   -0.02%     
==========================================
  Files        2079     2093      +14     
  Lines      183922   184432     +510     
==========================================
+ Hits        75382    75555     +173     
- Misses     102240   102563     +323     
- Partials     6300     6314      +14

Flag	Coverage Δ
unittests	`40.96% <30.39%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zhangchiqing · 2024-09-14T00:55:34Z

storage/operation/reads_test.go

@@ -0,0 +1,189 @@
+package operation_test


Test cases are taken from similar ones in storage/badger/operation/common_test.go

zhangchiqing · 2024-09-14T00:55:45Z

storage/operation/writes_test.go

@@ -0,0 +1,278 @@
+package operation_test


Test cases are taken from similar ones in storage/badger/operation/common_test.go

zhangchiqing · 2024-09-14T00:59:24Z

storage/operation/reads.go

+			keyCopy := make([]byte, len(key))
+			copy(keyCopy, key)


Copying the key is for safety, otherwise caller might make mistake storing the key in a slice and causing problem because this iteration here might overwrite the underlying slice. Making a copy of the key could prevent caller from making such mistake.

zhangchiqing · 2024-09-14T01:01:42Z

storage/operation/badgerimpl/iterator.go

+	tx := db.NewTransaction(false)
+	iter := tx.NewIterator(options)
+
+	lowerBound, upperBound := storage.StartEndPrefixToLowerUpperBound(startPrefix, endPrefix)


Note, we are applying the same approach as pebble to compute the exclusive upperBound here. This allows us to get rid of the global max value stored in the database (see: storage/badger/operation/max.go)

zhangchiqing · 2024-09-14T01:02:45Z

storage/operation/reads_test.go

+		keys := [][]byte{
+			// before start -> not included in range
+			{0x09, 0xff},
+			// within the start prefix -> included in range
+			{0x10, 0x00},
+			{0x10, 0xff},
+			// between start and end -> included in range
+			{0x15, 0x00},
+			{0x1A, 0xff},
+			// within the end prefix -> included in range
+			{0x20, 0x00},
+			{0x20, 0xff},
+			// after end -> not included in range
+			{0x21, 0x00},
+		}


The same test cases to test the boundary cases

zhangchiqing · 2024-09-14T01:03:04Z

cmd/bootstrap/utils/md5.go

 import (
+	// #nosec


fix linter error

zhangchiqing · 2024-09-16T23:07:40Z

storage/operation/pebbleimpl/writer.go

+}
+
+func (b *ReaderBatchWriter) Commit() error {
+	err := b.batch.Commit(pebble.Sync)


With the database operations being universal, it's easy to write benchmark to compare the performance between badger and pebble implementation.

The read performance for badger and pebble are very similar.

BenchmarkRetrieve/BadgerStorage-10 1217998 948.1 ns/op BenchmarkRetrieve/PebbleStorage-10 1699320 725.4 ns/op

However, for writes, the benchmark shows pebble is 100x slower than badger.

I initially did:

cd storage/operation go test -bench=BenchmarkUpsert goos: darwin goarch: arm64 pkg: github.com/onflow/flow-go/storage/operation BenchmarkUpsert/BadgerStorage-10 31804 35173 ns/op BenchmarkUpsert/PebbleStorage-10 270 4267359 ns/op PASS ok github.com/onflow/flow-go/storage/operation 4.886s

I tracked down to this Commit call by adding the following code.

n1 := time.Now() err := b.batch.Commit(pebble.Sync) n2 := time.Now() fmt.Println("pebbleimpl.Writer.go: Commit() time:", n2.Sub(n1).Nanoseconds())

Then I did the same to badger to record the duration for the Commit call, and run the TestReadWrite test case which writes a single entity data to the database, the result shows pebble is 25x slower than badger for writes (read performance is similar). I'm not sure why.

cd storage/operation gt -run=TestReadWrite badgerimpl.Writer.go: Commit() time: 147250 badgerimpl.Writer.go: Commit() time: 71458 badgerimpl.Writer.go: Commit() time: 72708 pebbleimpl.Writer.go: Commit() time: 3684875 pebbleimpl.Writer.go: Commit() time: 3006416 pebbleimpl.Writer.go: Commit() time: 3001083 badgerimpl.Writer.go: Commit() time: 6500 pebbleimpl.Writer.go: Commit() time: 584 PASS ok github.com/onflow/flow-go/storage/operation 1.136s

Thanks for making the benchmarks and finding this regression @zhangchiqing 🙇

100x slower writes is a big difference! 😬

I think it would be worthwhile to spend some time better understanding this finding and investigating how it would impact Flow (for example, the consensus hot-path). My gut feeling is that this benchmark in isolation is overstating the actual impact we would see. If it isn't, and switching to Pebble is going to make all our writes 2 orders of magnitude slower, I'd rather know that now.

Are there Pebble configs we can change to bring write speed down?

Does this result hold if aspects of the environment are changed (eg. maybe it's an artifact of the test database being very small? Just guessing)

Does this result hold if aspects of the load are changed (eg. committing a batch containing many writes)

I spent a short time during the review trying a few different iterations on BenchmarkUpsert:

Setting WithKeepL0InMemory to false (previously was true).

Setting WithValueThreshold to 1 byte (previously was 1kb).

Badger is different from Pebble in that it will sometimes store values directly in the LSM tree, and other times in a value log file, based on the value's size. All the values we were storing were below that 1kb threshold.

Using larger 10kb values

Unfortunately, all of these had similar results where Pebble was still ~100x slower...

Yeah, this might take some time to investigate. I will do it in a separate issue. But it should not block us from getting in with this PR, since it also allows us to move from badger transaction to badger batch updates.

Regarding the regression, Peter suspect it might be to do with the key-compression and decompression, since he did a profiling in the past and saw a lot of CPU cycles are spent on that. So might worth to try disabling compression in pebble and see if it makes a difference.

storage/batch.go

utils/unittest/unittest.go

storage/operations.go

storage/operation/reads_test.go

jordanschalm · 2024-11-01T15:44:32Z

storage/operation/reads_test.go

+
+func TestTraverse(t *testing.T) {
+	dbtest.RunWithStorages(t, func(t *testing.T, r storage.Reader, withWriter dbtest.WithWriter) {
+		keys := [][]byte{


This would probably be easier to work with as a map from key -> value. If we define the key type as [2]byte, and then do key[:] when inserting it, that would work.

Feel free to skip this suggestion.

storage/operation/writes_test.go

jordanschalm

This looks great. I like the way you structured testing so we can easily test both database backends together 💯

Besides the suggestions above, could you:

Make sure exported types have at least a basic godoc (copying the interface documentation to structs that implement it works). I added suggestions about this in a few places, but not all.
Make sure all public functions that return errors document any expected error types, or that "no errors are expected during normal operation".

jordanschalm

🎸

jordanschalm · 2024-11-04T16:54:35Z

storage/operations.go

+}
+
+// Writer is an interface for batch writing to a storage backend.
+// It cannot be used concurrently for writing.


Suggested change

// It cannot be used concurrently for writing.

// One Writer instance cannot be used concurrently by multiple goroutines.

// However, many goroutines can write concurrently, each using their own Writer instance.

AlexHentschel · 2024-11-21T20:33:02Z

storage/operations.go

+	// other errors are exceptions
+	//
+	// The caller should not modify the contents of the returned slice, but it is
+	// safe to modify the contents of the argument after Get returns. The


Suggested change

// safe to modify the contents of the argument after Get returns. The

// safe to modify the contents of the `key` argument after Get returns. The

AlexHentschel · 2024-11-21T21:14:27Z

storage/operations.go

+	// IterItem returns the current key-value pair, or nil if done.
+	IterItem() IterItem


should we maybe just call it Item?

AlexHentschel

Hi Leo,

thanks for putting this PR together. I have tried my best given my time constraints to go over what I considered the most correctness-critical details and the API. It is not a full review, but I wanted to help as much as I can with ensuring correctness of the implementation and its API-safety & API-clarity.

From my perspective, there aren't any major problems, but some areas where I would request iterations

Most of my comments are suggestions to refine some of the documentation, which is hopefully quick.
I think we still have some minor edge cases that need to be fixed (I marked those comments with "⚠️"). Very well possible, that I misunderstood the code and contrary to my intuition, the code is correct. Though, if you agree that there are problems, please make sure to include tests that cover these areas in detail.
Please make sure you enforce in the implementation that startPrefix ≤ endPrefix and the implementation errors. I think this should be enforced on the lowest level of the implementation: badgerimpl.dbReader and pebbleimpl.dbReader as the Iterator-constructors NewIter are publicly accessible. This check is so cheap that I would rather have it replicated than not applied in one of the many code paths.
I find the API of Exists, Retrieve, FindHighestAtOrBelow in the operations package very convoluted. See my last two comments (here and here) including suggestions on how to improve this.

AlexHentschel · 2024-11-21T21:25:34Z

storage/operations.go

+// Iterator is an interface for iterating over key-value pairs in a storage backend.
+type Iterator interface {


I would recommend to document the convention about upper and lower bound in the interface. I think the upper and lower bounds are much more than an implementation detail, because they determine the output of functions defined in the interface. While you document some of that in NewIter below, I would recommend to primarily document this convention in the interface that is supposed to adhere to this convention.

Iterator walks over all key value pairs (k,v), where the key k starts with a prefix in the range [a,b] (both bounds inclusive), with a ≤ b and an entry with key k exists in the database. Key-value pairs are visited in lexicographical order wrt the key k. If key-value pairs for the lower bound a and/or upper bound b exist, they are visited by the iterator.

not sure about the requirement a ≤ b, but I think it is worth-while to explicitly document. But given how StartEndPrefixToLowerUpperBound is implemented, I think it is a correctness requirement. If my interpretation is correct, constructors must enforce this!

AlexHentschel · 2024-11-21T21:43:40Z

storage/operations.go

+}
+
+// PrefixUpperBound returns a key K such that all possible keys beginning with the input prefix
+// sort lower than K according to the byte-wise lexicographic key ordering used by Pebble.


we also apply this to badger. Hence, I think we can remove the explicit mentioning of Pebble?

Suggested change

// sort lower than K according to the byte-wise lexicographic key ordering used by Pebble.

// sort lower than K according to the byte-wise lexicographic key ordering.

AlexHentschel · 2024-11-21T22:11:38Z

storage/operation/badgerimpl/iterator.go

+// Valid returns whether the iterator is positioned at a valid key-value pair.
+func (i *badgerIterator) Valid() bool {
+	// if it's beyond the upper bound, it's invalid
+	if !i.iter.Valid() {
+		return false
+	}
+	key := i.iter.Item().Key()
+	// "< 0" means "key < upperBound"
+	valid := bytes.Compare(key, i.upperBound) < 0
+	return valid
+}


⚠️❓❗

I am worried this might be incorrect:

my understanding is that the StartEndPrefixToLowerUpperBound function should translate the prefix bounds to bounds on the full key values:

flow-go/storage/operation/badgerimpl/iterator.go

Line 28 in 17972e0

lowerBound, upperBound := storage.StartEndPrefixToLowerUpperBound(startPrefix, endPrefix)

If we specify startPrefix, endPrefix := []byte{0x10}, []byte{0xff} the function StartEndPrefixToLowerUpperBound will turn this into nil (empty list) for the endPrefix.

the nil slice is lexicographically smaller than any non-empty slice ⚠️

AlexHentschel · 2024-11-21T22:14:42Z

storage/operations.go

+	// Next advances the iterator to the next key-value pair.
+	Next()
+
+	// IterItem returns the current key-value pair, or nil if done.


badger's Item() implementation specifies:

⚠️ This item is only valid until Next() gets called.

I think that is a safety critical constrained, which should be highlighted in the documentation!

AlexHentschel · 2024-11-21T22:25:31Z

storage/operation/badgerimpl/iterator.go

+	// if it's beyond the upper bound, it's invalid
+	if !i.iter.Valid() {
+		return false
+	}
+	key := i.iter.Item().Key()
+	// "< 0" means "key < upperBound"
+	valid := bytes.Compare(key, i.upperBound) < 0
+	return valid


I don't understand what is the difference between

flow-go/storage/operation/badgerimpl/iterator.go

Lines 44 to 47 in 17972e0

// if it's beyond the upper bound, it's invalid

if !i.iter.Valid() {

return false

}

you are documenting in line 44:

if it's beyond the upper bound, it's invalid

so why do we do the check again for the upper bound:

flow-go/storage/operation/badgerimpl/iterator.go

Lines 48 to 51 in 17972e0

key := i.iter.Item().Key()

// "< 0" means "key < upperBound"

valid := bytes.Compare(key, i.upperBound) < 0

return valid

Please document the answer in the implementation (not as a reply to my PR comment). Thanks

AlexHentschel · 2024-11-22T00:07:02Z

storage/operation/reads.go

+// Exists returns true if a key exists in the database.
+// No errors are expected during normal operation.
+func Exists(key []byte, keyExists *bool) func(storage.Reader) error {


I find this function's signature challenging - specifically that we input a pointer to a bool that is only set once the returned function is executed. I am sure you have good reasons for implementing it that way. Personally, I would prefer to keep code with this many indirections out of the production code base -- if possible.

If Exists is only used for testing, maybe move it into storage/operation/dbtest package and lets make sure we document it in detail:

Suggested change

// Exists returns true if a key exists in the database.

// No errors are expected during normal operation.

func Exists(key []byte, keyExists *bool) func(storage.Reader) error {

// Exists takes a key and a pointer to an a boolean variable `keyExists` as inputs and returns an function.

// When this returned function is executed (and only then), it will write into the `keyExists` whether

// the key exists.

// No errors are expected during normal operation.

func Exists(key []byte, keyExists *bool) func(storage.Reader) error {

If we want to keep this functionality as part of the production implementation, I would personally find it much simpler to just include the Exists method in the reader interface. Yes, we would duplicate the method across the two implementations - nevertheless, I would find this worth-while to remove API complexity. Suggesting something like this:

// Exists checks whether a key-value pair with the given key exists. The function returns nil, // if the key exists and `storage.ErrNotFound` if the key does not exist. // No other errors are expected during normal operation. func (b dbReader) Exists(key []byte) error { _, closer, err := b.Get(key) _ = closer.Close() // we only expect err to be nil or storage.ErrNotFound; all other cases are symptoms of an irrecoverable exception if err != nil && (!errors.Is(err, storage.ErrNotFound)) { return irrecoverable.NewExceptionf("could not load data: %w", err) } return err }

I agree, the API that uses the pointer as argument is a bit confusing. This is the existing badger style and I just followed it.

The reason the badger style has to take pointer as argument is that, the badger read queries has to work with badger transactions, which requires all queries to be curried as functors that takes a single badger transaction obj as argument.

However, since the new universal database operations is gonna move away from badger transaction, it's possible that all read queries can be made without functors.

I can still keep this functor style and make another KeyExists in the simplified version.

In terms of the KeyExists function implementation, there is some detail:

I prefer to return (hasKey bool, exception error), instead of just an error which use has to compare with storage.ErrNotFound

Your implementation always call closer.Close() regardless there is error or not. It seems the user might get confused about whether Close() call is needed when err != nil. Currently it would return nil, which might cause panic if user get confused. So to prevent mistakes like this, I decided to return always return a noopCloser even if err != nil.

AlexHentschel · 2024-11-22T00:16:08Z

storage/operation/reads.go

+// Retrieve will retrieve the binary data under the given key from the database
+// and decode it into the given entity. The provided entity needs to be a
+// pointer to an initialized entity of the correct type.
+// Error returns:
+//   - storage.ErrNotFound if the key does not exist in the database
+//   - generic error in case of unexpected failure from the database layer, or failure
+//     to decode an existing database value
+func Retrieve(key []byte, entity interface{}) func(storage.Reader) error {
+	return func(r storage.Reader) error {
+		val, closer, err := r.Get(key)
+		if err != nil {
+			return err
+		}
+
+		defer closer.Close()
+
+		err = msgpack.Unmarshal(val, entity)
+		if err != nil {
+			return irrecoverable.NewExceptionf("could not decode entity: %w", err)
+		}
+		return nil
+	}
+}
+
+// FindHighestAtOrBelow finds the highest key with the given prefix and
+// height equal to or below the given height.
+func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error {
+	return func(r storage.Reader) error {
+		if len(prefix) == 0 {
+			return fmt.Errorf("prefix must not be empty")
+		}
+
+		key := append(prefix, EncodeKeyPart(height)...)
+		it, err := r.NewIter(prefix, key, storage.DefaultIteratorOptions())
+		if err != nil {
+			return fmt.Errorf("can not create iterator: %w", err)
+		}
+		defer it.Close()
+
+		var highestKey []byte
+		// find highest value below the given height
+		for it.First(); it.Valid(); it.Next() {
+			highestKey = it.IterItem().Key()
+		}
+
+		if len(highestKey) == 0 {
+			return storage.ErrNotFound
+		}
+
+		// read the value of the highest key
+		val, closer, err := r.Get(highestKey)
+		if err != nil {
+			return err
+		}
+
+		defer closer.Close()
+
+		err = msgpack.Unmarshal(val, entity)
+		if err != nil {
+			return irrecoverable.NewExceptionf("failed to decode value: %w", err)
+		}
+
+		return nil
+	}
+}


🤯

similarly with Retrieve and FindHighestAtOrBelow: I think the API is way too complicated. Just include those in the interface.

If you are concerned about duplicating code in the implementation for pebble and badger, you can use the decorator pattern:

the badgerimpl.dbReader and pebbleimpl.dbReader only implement Get and NewIter, while the interface would additionally demand the extra methods Exists, Retrieve, FindHighestAtOrBelow

constructors for badgerimpl.dbReader and pebbleimpl.dbReader could then wrap their concrete type with an implementation-agnostic decorator that adds Exists, Retrieve, FindHighestAtOrBelow

Its a bit more effort on the framework, but I think it is worth-while for removing the convoluted indirections of returning a functor that sets an input in the original method, which generated the functor 🤯. Essentially the same API simplification as for the Exists method.

I agree the API of returning functions is complicated. It's worth noting that Leo is creating these APIs in this way so that they will be drop-in compatible with our existing database code, which all uses this pattern:

flow-go/storage/badger/operation/payload.go

Lines 9 to 15 in 164fa95

func InsertSeal(sealID flow.Identifier, seal *flow.Seal) func(*badger.Txn) error {

return insert(makePrefix(codeSeal, sealID), seal)

}

func RetrieveSeal(sealID flow.Identifier, seal *flow.Seal) func(*badger.Txn) error {

return retrieve(makePrefix(codeSeal, sealID), seal)

}

We aren't introducing new APIs here. We're replicating APIs that already exist to make the transition smoother.

The reason the existing database code uses this pattern is to match Badger's API, for example:

func (db *DB) View(fn func(txn *Txn) error) error

I think the interface changes you're suggesting here are good ideas, but implementing them during the transition, while supporting both database backends, would be burdensome.

Yeah, exactly like Jordan said (thanks for explaining it), that it minimizes the changes when refactoring existing modules, so that the changes are easier for us to review.

However, if we take a step back and rethink about why we need these wraps original, I actually agree with Alex's point that the functors (Get, Retrieve, etc) are unnecessary wraps if we are not using transactions.

So I refactored them to keep both the implementation by extracting the non-functor version, and moved the functor versions into the read_functors.go files, marked them to be deprecated as we are gonna replace stateful transactions with stateless batch updates. Reasonings are added as comments in that file.

Thanks for the idea Alex.

AlexHentschel · 2024-11-22T00:28:47Z

storage/operation/reads.go

+		var highestKey []byte
+		// find highest value below the given height
+		for it.First(); it.Valid(); it.Next() {
+			highestKey = it.IterItem().Key()
+		}
+
+		if len(highestKey) == 0 {
+			return storage.ErrNotFound
+		}


⚠️ my understanding is that if we call next, the previous IterItem() might not be valid anymore and since key only returns a slice reference, whose underlying array might be re-used by the database (see also my comment here for context). My worry is that the same problem might extend to highestKey.

Furthermore, I would prefer to use the Valid method that the implementation provides to test for validity instead of len(highestKey) == 0.

AlexHentschel · 2024-11-22T00:42:41Z

storage/operation/reads.go

+// FindHighestAtOrBelow finds the highest key with the given prefix and
+// height equal to or below the given height.
+func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error {


I misunderstood this function at first. Suggesting to refine the documentation:

Suggested change

// FindHighestAtOrBelow finds the highest key with the given prefix and

// height equal to or below the given height.

func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error {

// FindHighestAtOrBelow is for database entries that are indexed by block height. It is suitable to search

// keys with the format prefix` + `height` (where "+" denotes concatenation of binary strings). The height

// is encoded as Big-Endian (entries with numerically smaller height have lexicographically smaller key).

// The function finds the *highest* key with the given prefix and height equal to or below the given height.

func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error {

AlexHentschel · 2024-11-22T00:44:55Z

storage/operation/writes.go

+}
+
+// RemoveByPrefix removes all keys with the given prefix defined by [startPrefix, endPrefix] (both inclusive).
+// If no keys exist with the given prefix, this is a no-op.


Suggested change

// If no keys exist with the given prefix, this is a no-op.

// We require that startPrefix ≤ endPrefix (otherwise this function errors).

// If no keys exist with the given prefix, this is a no-op.

Co-authored-by: Alexander Hentschel <alex.hentschel@flowfoundation.org>

janezpodhostnik

Looks great!
Most of my comments are only style comments.

The one thing I'm really concerned about is that we are dropping the errors on closers from storage.Reader.Get

storage/operations.go

storage/operation/pebbleimpl/iterator.go

janezpodhostnik · 2024-12-10T20:54:34Z

storage/operation/dbtest/helper.go

+// helper types and functions
+type WithWriter func(func(storage.Writer) error) error
+
+func RunWithStorages(t *testing.T, fn func(*testing.T, storage.Reader, WithWriter)) {


nit:

right now this is used as

dbtest.RunWithStorages(t, func(t *testing.T, r storage.Reader, withWriter dbtest.WithWriter){...}

but it might be more ergonomic if we could use it as:

t.Run("My Test", dbtest.WithStorages(func(*testing.T, storage.Reader, WithWriter){...})

The RunWithStorages creates two t.Run:

t.Run("BadgerStorage", ... t.Run("PebbleStorage",

This allows us to define one test cases and run with two different database implementation.

What I meant is that we could change the signature of RunWithStorages from

func RunWithStorages(t *testing.T, fn func(*testing.T, storage.Reader, WithWriter)) { // ... body ... }

to

func RunWithStorages(fn func(*testing.T, storage.Reader, WithWriter)) func(t *testing.T) { return func(t *testing.T) { // ... exact same body ... } }

so that the usage can change from

dbtest.RunWithStorages(t, func(t *testing.T, r storage.Reader, withWriter dbtest.WithWriter){ // ... test body ... }

to

t.Run("My Test", dbtest.RunWithStorages(func(*testing.T, storage.Reader, WithWriter){ // ... same test body ... })

Its functionally almost the same, but the later approach allows you to nest the tests a little better.

janezpodhostnik · 2024-12-10T21:06:18Z

storage/operation/pebbleimpl/writer.go

+	globalReader storage.Reader
+	batch        *pebble.Batch
+
+	callbacks op.Callbacks


callbacks do not need to be this deep.

You can construct a type that wraps a generic storage.ReaderBatchWriter and implements Commit.

type ReaderBatchWriterWithCallback struct { storage.ReaderBatchWriter callbacks op.Callbacks }

This way you move that responsibility away from the specific db implementation.

Or just have ReaderBatchWriter be a operation.Callbacks

type ReaderBatchWriter struct { operation.Callbacks ... }

janezpodhostnik · 2024-12-10T21:21:10Z

storage/operations.go

+	// GlobalReader returns a database-backed reader which reads the latest committed global database state ("read-committed isolation").
+	// This reader will not read writes written to ReaderBatchWriter.Writer until the write batch is committed.
+	// This reader may observe different values for the same key on subsequent reads.
+	GlobalReader() Reader


I cant quite imagine how this will be used...

I feel like this is sort of a "batch".

type Batch interface { Writer AddCallback(func(error)) Commit() error }

var batch Batch batch := pebble.NewBatch(...) batch.AddCallback(func(error){...}) // then use the batch as a writer var writer Writer writer := batch writer.Set(...) _ := batch.Commit()

Do you mean why the GlobalReader is part of the Batch?

It allows us to read data and depending the result write different data to the batch. The Reader is part of the Writer so that it's ensure the underlying database instance is the same. And the reason its called GlobalReader is to indicate its reading the committed state.

I meant to place the comment at the name of the interface.

What I was trying to say is that the interface name (or purpose) is just to be a Batch why not call it a batch instead of ReaderBatchWriter?

Also why does it return a writer instead of just being a writer?

storage/operation/pebbleimpl/reader.go

storage/operation/reads.go

storage/operation/pebbleimpl/iterator.go

zhangchiqing commented Sep 14, 2024

View reviewed changes

zhangchiqing mentioned this pull request Sep 14, 2024

[Badger] Refactor approvals to use badger batch updates #6466

Draft

zhangchiqing commented Sep 16, 2024

View reviewed changes

zhangchiqing marked this pull request as ready for review September 16, 2024 23:15

zhangchiqing requested review from jordanschalm, AlexHentschel, peterargue and durkmurder September 16, 2024 23:16

j1010001 mentioned this pull request Oct 3, 2024

Replace Badger with Pebble DB - Execution data #6527

Open

zhangchiqing force-pushed the leo/db-ops branch from 87dc8e1 to b82106a Compare October 8, 2024 20:40

zhangchiqing force-pushed the leo/db-ops branch from 5c99655 to 257bde0 Compare October 28, 2024 23:44

This was referenced Oct 28, 2024

[Storage] Pruning - add a test case to verify compaction can reclaim disk space #6603

Open

[Review Version] Refactor Protocol State with Pebble-based Storage #6197

Closed

zhangchiqing force-pushed the leo/db-ops branch from 257bde0 to bbfb3a1 Compare October 30, 2024 16:56

jordanschalm reviewed Nov 1, 2024

View reviewed changes

zhangchiqing force-pushed the leo/db-ops branch 4 times, most recently from 284750a to 93fcaa1 Compare November 1, 2024 21:45

zhangchiqing added 4 commits November 1, 2024 17:06

add universal database operations

8d98f0f

address review comments

90a2426

consolidate PrefixUpperbound

326772e

update comments

17972e0

zhangchiqing force-pushed the leo/db-ops branch from 77b44d1 to 17972e0 Compare November 2, 2024 00:06

zhangchiqing requested a review from jordanschalm November 2, 2024 00:06

jordanschalm approved these changes Nov 4, 2024

View reviewed changes

AlexHentschel reviewed Nov 21, 2024

View reviewed changes

AlexHentschel reviewed Nov 22, 2024

View reviewed changes

zhangchiqing and others added 10 commits December 5, 2024 12:21

add comments

eea7a2c

fix the boundary iteration case for badger

2a36b57

add review comments

4766421

fix DeleteRange

a725247

refactor StartEndPrefixToLowerUpperBound

b5eaf02

add prefix check in NewIter

47ef8aa

add RemoveByRange

18c0a17

refactor to use KeyCopy

cd39e67

Apply suggestions from code review

c68ab0d

Co-authored-by: Alexander Hentschel <alex.hentschel@flowfoundation.org>

add comments

dac89f2

zhangchiqing requested a review from janezpodhostnik December 6, 2024 18:13

zhangchiqing added 4 commits December 6, 2024 10:30

update comments for Get method

f69cbcc

making it optional to call closer when Get fails

42f360e

refactor Retrieve with RetrieveByKey

05862f6

extract functions out of functors

e5ae317

janezpodhostnik approved these changes Dec 10, 2024

View reviewed changes

zhangchiqing added 6 commits December 10, 2024 14:06

address review comments

88d86d2

address review comments

ef995d4

use RWMutex

ffe023c

add noop closer

fdce574

handle close error

eebde06

update comments

9f49267

janezpodhostnik reviewed Dec 11, 2024

View reviewed changes

storage/operation/pebbleimpl/iterator.go Show resolved Hide resolved

zhangchiqing enabled auto-merge December 11, 2024 23:17

zhangchiqing added 2 commits December 11, 2024 15:35

Merge branch 'master' into leo/db-ops

d8f31c5

revert closer changes

24d65bc

zhangchiqing added this pull request to the merge queue Dec 12, 2024

Merged via the queue into master with commit 541d744 Dec 12, 2024
55 checks passed

zhangchiqing deleted the leo/db-ops branch December 12, 2024 18:39

	// It cannot be used concurrently for writing.
	// One Writer instance cannot be used concurrently by multiple goroutines.
	// However, many goroutines can write concurrently, each using their own Writer instance.

	// safe to modify the contents of the argument after Get returns. The
	// safe to modify the contents of the `key` argument after Get returns. The

		// IterItem returns the current key-value pair, or nil if done.
		IterItem() IterItem

		// Iterator is an interface for iterating over key-value pairs in a storage backend.
		type Iterator interface {

	// sort lower than K according to the byte-wise lexicographic key ordering used by Pebble.
	// sort lower than K according to the byte-wise lexicographic key ordering.

	// if it's beyond the upper bound, it's invalid
	if !i.iter.Valid() {
	return false
	}

	key := i.iter.Item().Key()
	// "< 0" means "key < upperBound"
	valid := bytes.Compare(key, i.upperBound) < 0
	return valid

	func InsertSeal(sealID flow.Identifier, seal flow.Seal) func(badger.Txn) error {
	return insert(makePrefix(codeSeal, sealID), seal)
	}

	func RetrieveSeal(sealID flow.Identifier, seal flow.Seal) func(badger.Txn) error {
	return retrieve(makePrefix(codeSeal, sealID), seal)
	}

	// If no keys exist with the given prefix, this is a no-op.
	// We require that startPrefix ≤ endPrefix (otherwise this function errors).
	// If no keys exist with the given prefix, this is a no-op.

[Badger] Add universal database operations #6465

[Badger] Add universal database operations #6465

Conversation

zhangchiqing commented Sep 14, 2024 • edited Loading

codecov-commenter commented Sep 14, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangchiqing Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordanschalm left a comment

Choose a reason for hiding this comment

jordanschalm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

⚠️❓❗

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

🤯

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexHentschel Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janezpodhostnik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangchiqing commented Sep 14, 2024 •

edited

Loading

codecov-commenter commented Sep 14, 2024 •

edited

Loading

zhangchiqing Sep 16, 2024 •

edited

Loading

AlexHentschel left a comment •

edited

Loading

AlexHentschel Nov 22, 2024 •

edited

Loading