-
Notifications
You must be signed in to change notification settings - Fork 1.5k
iterate returns duplicate keys #632
Comments
I looked into it more. I think actually the db itself is corrupted during page split? (or we failed to reload the nodes correctly) There are 3 duplicated keys in two adjacent pages:
|
@xiang90 Are you retaining and altering the |
@benbjohnson No. Each key is put only once without any buffering, so no future access. But I will double check. |
@benbjohnson I suspect the split code is the cause // Split inodes across two nodes.
next.inodes = n.inodes[splitIndex:]
n.inodes = n.inodes[:splitIndex] We do not copy the backing array. So if we append one element to n.inodes, it will overwrite the first element of next.inodes. See my example here: https://play.golang.org/p/xjI9r6rpow But this is just my guess by reading limited boltdb code. I am not sure. |
On the second thought, it might not be the exact cause. We put keys sequentially. We do not miss any keys, but there are 3 duplicates in between. Basically it looks like: <------- |dup|-------> |
@xiang90 Do you have an example program that writes the data? |
package playground
import (
"sync"
"github.com/boltdb/bolt"
)
type store struct {
sync.Mutex
revision int
tx *bolt.Tx
db *bolt.DB
}
func (s *store) put(value []byte) {
s.Lock()
defer s.Unlock()
bucket := tx.Bucket("keys")
if bucket == nil {
panic("bucket key does not exist")
}
// it is useful to increase fill percent when the workload is seq append.
// this can delay the page split and reduce space usage.
bucket.FillPercent = 0.9
if err := bucket.Put(intToBytes(s.revision), value); err != nil {
panic(err)
}
s.revision++
}
func (s *store) get(rev int) []byte {
s.Lock()
defer s.Unlock()
bucket := tx.Bucket("keys")
if bucket == nil {
panic("bucket key does not exist")
}
b := bucket.Get(intToBytes(rev))
nb := make([]byte, len(b))
copy(nb, b)
return nb
}
// called every 100ms, every 10000 puts
func (s *store) commit() {
s.Lock()
defer s.Unlock()
if err := s.tx.Commit(); err != nil {
panic(err)
}
var err error
s.tx, err = s.db.Begin(true)
if err != nil {
panic(err)
}
} It looks pretty much like this. |
The put bytes are generated by marshaling a protobuf request, so it will never be accessed again. The get part is not exactly accurate. we actually unmarshal the data into a protobuf response holding the lock. Protobuf unmarshalling should copy the data. |
@xiang90 we occured the same issue, how could you fixed it? |
I expect iterating a bucket with cursor return keys in order, but it would return duplicated keys.
Here is the data
And here is the script to reproduce the problem
Here is the output:
Am I missing something?
The key
\x00\x00\x00\x00\x00\x00\xfe\x9c_\x00\x00\x00\x00\x00\x00\x00\x00
shows up twice during cursor iterating.The text was updated successfully, but these errors were encountered: