Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvcc/kvstore: Optimize compaction #11150

Closed
wants to merge 128 commits into from
Closed

Conversation

shenjiangc
Copy link
Contributor

mvcc/kvstore: Optimize compaction

when the number of key-value index is greater than one million, compact take too long and blocks other requests.
Move the “kvindex.Compact” to “fifoSched.Schedule” func

jingyih and others added 2 commits February 12, 2019 15:40
Remove auth validation loop in v3_server.raftRequest(). Re-validation
when error ErrAuthOldRevision occurs should be handled on client side.
…ompact take too long and blocks other requests
@jingyih
Copy link
Contributor

jingyih commented Sep 14, 2019

Could you elaborate how this change helps? This is still the same amount of work but is done at a different time? Do you have test data showing the improvement? Thanks!

Also if we move index compaction to the scheduled FIFO, we need to adjust the metric "indexCompactionPauseMs" accordingly.

@shenjiangc
Copy link
Contributor Author

shenjiangc commented Sep 14, 2019

Could you elaborate how this change helps? This is still the same amount of work but is done at a different time? Do you have test data showing the improvement? Thanks!

Also if we move index compaction to the scheduled FIFO, we need to adjust the metric "indexCompactionPauseMs" accordingly.

Yes, Just let index compact do in another goroutine。

first,I put 1000000 or more key-value:
./tools/benchmark --target-leader --conns=50 --clients=50 put --key-size=64 --sequential-keys --total=5000000 --val-size=256 --key-space-size=5000000

then,i do compact use:
rev=$(./etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]' | egrep -o '[0-9].') ; echo $rev
./etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
the compaction took more than 1 sec。

when the compact is going on, the other new put request is blocked, and Completion delay > 1 sec. The reason is the index compact took too time to assign map[],when the tree is big enough。

Move index compaction to the scheduled FIFO, let the compaction don't block the new request。

TEST :(put 5 key,when compacting)
Before Modification:
./tools/benchmark --target-leader --conns=1 --clients=1 put --key-size=64 --sequential-keys --total=5 --val-size=256 --key-space-size=5
Summary:
Total: 2.9459 secs.
Slowest: 2.9445 secs.
Fastest: 0.0002 secs.
Average: 0.5892 secs.
Stddev: 1.1776 secs.
Requests/sec: 1.6973
Response time histogram:
0.0002 [1] |∎∎∎∎∎∎∎∎∎∎∎∎∎
0.2947 [3] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.5891 [0] |
0.8835 [0] |
1.1779 [0] |
1.4723 [0] |
1.7668 [0] |
2.0612 [0] |
2.3556 [0] |
2.6500 [0] |
2.9445 [1] |∎∎∎∎∎∎∎∎∎∎∎∎∎

After Modification:
./tools/benchmark --target-leader --conns=1 --clients=1 put --key-size=64 --sequential-keys --total=5 --val-size=256 --key-space-size=5
Summary:
Total: 0.0028 secs.
Slowest: 0.0014 secs.
Fastest: 0.0003 secs.
Average: 0.0005 secs.
Stddev: 0.0004 secs.
Requests/sec: 1783.4436

Response time histogram:
0.0003 [1] |∎∎∎∎∎∎∎∎∎∎∎∎∎
0.0004 [3] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
0.0005 [0] |
0.0006 [0] |
0.0007 [0] |
0.0009 [0] |
0.0010 [0] |
0.0011 [0] |
0.0012 [0] |
0.0013 [0] |
0.0014 [1] |∎∎∎∎∎∎∎∎∎∎∎∎∎

bill-of-materials was fixed for module aware 'go list' as part of coreos/license-bill-of-materials#17
So can re enable bom tests

fixes etcd-io#11132
@jingyih
Copy link
Contributor

jingyih commented Sep 15, 2019

Thanks for doing benchmark testing!

With this change, when the actual index compaction happens later, it will still block other requests such as put request. Is my understanding correct?

Remotes is not a valid git command. Also, set the environmental variable
correctly.
@shenjiangc
Copy link
Contributor Author

With this modification, compaction will no longer block other request messages。;)

hack: fix cherrypick instruction
@jingyih
Copy link
Contributor

jingyih commented Sep 16, 2019

With this modification, compaction will no longer block other request messages。;)

Sorry I don't quite understand. This modification basically postponed the compaction work in index tree to a later time. So I can understand the compaction request does not block other requests immediately. But later when the actual compaction work happens, it has the same effect on other requests, right?

@shenjiangc
Copy link
Contributor Author

shenjiangc commented Sep 16, 2019

Hi,
The etcd process normal request messages such as put/get/compact in the same goroute “go run()”. When it is processing compact,it will can’t process other messages. The fifo scheduler is another goroute. So I move the index compact to fifo scheduler, to let the goroute “go run()” to process the next request as soon.
I tested it,when the actual compaction happen,it doesn’t block the other request.

@jingyih
Copy link
Contributor

jingyih commented Sep 17, 2019

Thanks I see your point. My previous thinking was that the index compaction does hold a write lock most of the time during its execution, so it blocks other request (which also need read or write the same index tree) anyway.

Another potential concern is the function returns w/o actually doing the compaction work in index tree (because it is just scheduled). So depends on when the index compaction happens, the returned "keep" could be different, meaning the keys to be deleted in the data base is different. I don't think there are any correctness concerns, but it makes consistency checking among nodes more difficult. - not completely sure about what I said here, need some time to read and think through.

gyuho and others added 15 commits September 17, 2019 13:29
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Since NewMutex will append a slash to pfx, there is no need to append a
slash beforehand.
clientv3/concurrency: remove the unneeded slash
Update the code of conduct. Remove notice file.
clientv3: remove the redundant CancelFunc invocation
test(functional): remove unknown field Etcd.Debug
Add slack chat contact.
Added link and removed wrongly copied text
gyuho added 2 commits October 23, 2019 10:33
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
scripts/release: list GPG key only when tagging is needed
@jingyih
Copy link
Contributor

jingyih commented Oct 25, 2019

So depends on when the index compaction happens, the returned "keep" could be different, meaning the keys to be deleted in the data base is different.

I think my previous statement was wrong. "keep" does not depend on key modifications after compaction rev.

So this should be safe.

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jingyih
Copy link
Contributor

jingyih commented Oct 25, 2019

This conflicts with the tracing we recently added in #11179.

@YoyinZyc Could you help take a look? If we move the compaction of in-memory index to scheduled, can we still time it? Or maybe we need to drop the corresponding step?

trace.Step("compact in-memory index tree")

@jingyih
Copy link
Contributor

jingyih commented Oct 25, 2019

Oh actually we also need to adjust the calculation of metrics indexCompactionPauseMs.

@YoyinZyc
Copy link
Contributor

This conflicts with the tracing we recently added in #11179.

Yuchen Zhou Could you help take a look? If we move the compaction of in-memory index to scheduled, can we still time it? Or maybe we need to drop the corresponding step?

trace.Step("compact in-memory index tree")

We can remove this step in that case. It will be timed with the backend together as a whole scheduled job.

@jingyih
Copy link
Contributor

jingyih commented Oct 26, 2019

@shenjiangc Could you please rebase to current master? And then:

  1. remove trace.Step("compact in-memory index tree").
  2. move the calculation of "indexCompactionPauseMs" to the scheduled function. It is used to measure the duration of s.kvindex.Compact(rev).

jingyih and others added 14 commits October 25, 2019 18:39
Disable TestV3AuthOldRevConcurrent for now. See
etcd-io#10468 (comment)
etcdserver: remove infinite loop for auth in raftRequest
To prevent the purge file loop from accidentally acquiring the file lock
and remove the files during server shutdowm.
etcdserver: wait purge file loop to finish during shutdown
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
clientv3: fix retry/streamer error message
lease:Add Unlock before break in loop
mvcc: Add Unlock before panic to prevent double lock
etcdserver: fix a bug which append object to a new allocated sized slice
…ompact take too long and blocks other requests
@shenjiangc
Copy link
Contributor Author

@shenjiangc Could you please rebase to current master? And then:

  1. remove trace.Step("compact in-memory index tree").
  2. move the calculation of "indexCompactionPauseMs" to the scheduled function. It is used to measure the duration of s.kvindex.Compact(rev).

I made some mistacks, the new pull request is #11330 , Please take a look。

jingyih added a commit that referenced this pull request Nov 5, 2019
mvcc/kvstore: Optimize compaction, slove conflict for #11150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.