Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move prepareClientToWrite out of loop for lrange command to reduce the redundant call. #860

Merged
merged 8 commits into from
Aug 28, 2024

Conversation

lipzhu
Copy link
Contributor

@lipzhu lipzhu commented Aug 1, 2024

Description

When I explore the cycles distributions for lrange test ( valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50 --threads 4). I found the prepareClientToWrite and clientHasPendingReplies could be reduced to single call outside instead of called in a loop, ideally we can gain 3% performance. The corresponding LRANG_100, LRANG_300, LRANGE_500, LRANGE_600 have ~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the prepareClientToWrite and its child clientHasPendingReplies out of the loop to reduce the function overhead.

image

Test Environment

OPERATING SYSTEM: Ubuntu 22.04.4 LTS
Kernel: 5.15.0-116-generic
PROCESSOR: Intel Xeon Platinum 8380
Server and Client in same socket.

Server Configuration

taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf

port 9001
bind * -::*
daemonize no
protected-mode no
save ""

Benchmark Results

Test Name Perf Boost
memtier_benchmark-1key-list-10-elements-lrange-all-elements 2%
memtier_benchmark-1key-list-100-elements-lrange-all-elements 3%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements 2%

loop for lrange command.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
Copy link

codecov bot commented Aug 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.32%. Comparing base (b728e41) to head (b2f2001).
Report is 55 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #860      +/-   ##
============================================
- Coverage     70.47%   70.32%   -0.16%     
============================================
  Files           112      113       +1     
  Lines         61467    61744     +277     
============================================
+ Hits          43320    43422     +102     
- Misses        18147    18322     +175     
Files Coverage Δ
src/networking.c 88.47% <100.00%> (-0.38%) ⬇️
src/server.h 100.00% <ø> (ø)
src/t_list.c 92.60% <100.00%> (-0.26%) ⬇️

... and 37 files with indirect coverage changes

@lipzhu lipzhu changed the title Reduce the redundant call of prepareClientToWrite in a loop. Move the redundant call of prepareClientToWrite out of loop for lrange command. Aug 1, 2024
@lipzhu lipzhu changed the title Move the redundant call of prepareClientToWrite out of loop for lrange command. Move prepareClientToWrite out of loop for lrange command to reduce the redundant call. Aug 1, 2024
@lipzhu
Copy link
Contributor Author

lipzhu commented Aug 5, 2024

Ping @valkey-io/core-team, could you help to take a look?

@madolson madolson requested a review from ranshid August 5, 2024 18:29
src/server.h Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
@madolson
Copy link
Member

madolson commented Aug 6, 2024

@lipzhu So I have a proposal to add some more type checking, which alleviates my concerns to prevent someone from misusing this API.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
Co-authored-by: Wangyang Guo <wangyang.guo@intel.com>
@lipzhu
Copy link
Contributor Author

lipzhu commented Aug 9, 2024

@lipzhu So I have a proposal to add some more type checking, which alleviates my concerns to prevent someone from misusing this API.

@madolson I am not sure I got your point, I just simply introduce the writeReadyClient which is alias of client, or do you prefer a new writeReadyClient struct to wrap the client?

@lipzhu
Copy link
Contributor Author

lipzhu commented Aug 12, 2024

Or we can introduce a new flag like write_ready in client structure, set write_ready = 1 before iteration and reset it after exit?

@madolson
Copy link
Member

madolson commented Aug 13, 2024

Or we can introduce a new flag like write_ready in client structure, set write_ready = 1 before iteration and reset it after exit?

We would have to reset it at the end of the command evocation. My concern is that someone might miss resetting it. There is no good idiomatic way to execute a defer in C afaik.

@hpatro
Copy link
Contributor

hpatro commented Aug 13, 2024

I think the application of this writeReadyClient API will be in plenty of places if we accept it. The part I'm not sure about is how/when devs will pick the writeReady API(s) over the regular API(s). Also, we would need to duplicate most of the addReply API(s).

Overall, I feel introducing the client flag write_ready seems to introduce less redundant code and achieves the goal. Regarding the cleanup can't we do it in afterCommand flow once and guarantee the reset (not everyone needs to remember to do it).

@madolson
Copy link
Member

madolson commented Aug 14, 2024

Regarding the cleanup can't we do it in afterCommand flow once and guarantee the reset (not everyone needs to remember to do it).

We could maybe do it in afterCommand. Would need to go look through the various edge cases like client disconnects that might get bypassed. @lipzhu Do you want to prototype that? Otherwise I'm fine merging this as is.

@lipzhu
Copy link
Contributor Author

lipzhu commented Aug 15, 2024

Regarding the cleanup can't we do it in afterCommand flow once and guarantee the reset (not everyone needs to remember to do it).

We could maybe do it in afterCommand. Would need to go look through the various edge cases like client disconnects that might get bypassed. @lipzhu Do you want to prototype that? Otherwise I'm fine merging this as is.

@madolson Maybe we can merge this firstly. Regarding the proposal, I can open a new PR which focus on the code refactor.

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
src/server.h Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
src/server.h Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
@madolson
Copy link
Member

The code felt good, but I changed the wording to writePreparedClient, which felt a bit better. @lipzhu Will wait for you to ack that you think this makes sense then merge.

src/networking.c Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
@lipzhu
Copy link
Contributor Author

lipzhu commented Aug 28, 2024

The code felt good, but I changed the wording to writePreparedClient, which felt a bit better. @lipzhu Will wait for you to ack that you think this makes sense then merge.

The changes make sense to me, thanks for your help. @madolson

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
@madolson madolson merged commit 076bf66 into valkey-io:unstable Aug 28, 2024
47 checks passed
@madolson madolson added the release-notes This issue should get a line item in the release notes label Aug 28, 2024
@lipzhu lipzhu deleted the lrange branch August 28, 2024 02:18
madolson pushed a commit that referenced this pull request Sep 2, 2024
…e redundant call. (#860)

## Description 
When I explore the cycles distributions for `lrange` test (
`valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50
--threads 4`). I found the `prepareClientToWrite` and
`clientHasPendingReplies` could be reduced to single call outside
instead of called in a loop, ideally we can gain 3% performance. The
corresponding `LRANG_100`, `LRANG_300`, `LRANGE_500`, `LRANGE_600` have
~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the `prepareClientToWrite` and its child
`clientHasPendingReplies` out of the loop to reduce the function
overhead.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
madolson pushed a commit that referenced this pull request Sep 3, 2024
…e redundant call. (#860)

## Description 
When I explore the cycles distributions for `lrange` test (
`valkey-benchmark -p 9001 -t lrange -d 100 -r 1000000 -n 1000000 -c 50
--threads 4`). I found the `prepareClientToWrite` and
`clientHasPendingReplies` could be reduced to single call outside
instead of called in a loop, ideally we can gain 3% performance. The
corresponding `LRANG_100`, `LRANG_300`, `LRANGE_500`, `LRANGE_600` have
~2% - 3% performance boost, the benchmark test prove it helps.

This patch try to move the `prepareClientToWrite` and its child
`clientHasPendingReplies` out of the loop to reduce the function
overhead.

---------

Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
imasahiro added a commit to imasahiro/valkey that referenced this pull request Oct 3, 2024
Similar to valkey-io#860 but this is for HGETALL families (HGETALL/HKEYS/HVALS).
This patch moves `prepareClientToWrite` out of the loop to reduce
the function overhead.

| test          | unstable(150c197) rpc | this patch (rpc) | improvements |
| ------------- | -------------------------- | ---------------- | ------------ |
| HGETALL h1    | 85084.66                   | 86926.29         |  2.16446772  |
| HGETALL h10   | 78400.62                   | 76893.5          | -1.922331737 |
| HGETALL h25   | 64487.01                   | 58802.77         | -8.814550403 |
| HGETALL h50   | 47587.32                   | 49360.78         |  3.726749058 |
| HGETALL h100  | 33028.37                   | 34454.25         |  4.317137055 |
| HGETALL h300  | 14628.22                   | 15540.98         |  6.239720212 |
| HGETALL h500  | 9593.52                    | 10395.44         |  8.358975642 |

Signed-off-by: Masahiro Ide <masahiro.ide@lycorp.co.jp>
madolson added a commit that referenced this pull request Oct 12, 2024
Similar to #860 but this is for HGETALL families (HGETALL/HKEYS/HVALS).
This patch moves `prepareClientToWrite` out of the loop to reduce the
function overhead.

Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
eifrah-aws pushed a commit to eifrah-aws/valkey that referenced this pull request Oct 20, 2024
…1119)

Similar to valkey-io#860 but this is for HGETALL families (HGETALL/HKEYS/HVALS).
This patch moves `prepareClientToWrite` out of the loop to reduce the
function overhead.

Signed-off-by: Masahiro Ide <imasahiro9@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants