Introduce shared query buffer for client reads #258

uriyage · 2024-04-08T09:53:11Z

This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer.

The primary changes are:

Adding a shared query buffer shared_qb that clients use by default
Modifying client querybuf initialization and reset logic
Copying any partial query from shared to private buffer before command execution
Freeing idle client query buffers when empty to allow reuse of shared buffer
Master client query buffers are kept private as their contents need to be preserved for replication stream

In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients.

The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client.

madolson

Overall it looks good to me. Can you document the procedure you were using when you saw the 3% performance boost? Did we also test it for pipelined clients?

I would also ideally like some protection in freeClient() to make sure if a client gets freed it somehow doesn't also free the shard query buffer.

src/networking.c

src/server.c

tests/unit/querybuf.tcl

src/replication.c

uriyage · 2024-05-05T11:14:00Z

Performance Benchmark:

Server Setup:

3 million keys, each with 512 bytes

Benchmark Configuration:

1,000 clients running SET commands
Server and benchmark running on separate ARM instances with 64 cores

Server Command:

./valkey-server --save

Benchmark Command:

./valkey-benchmark -t set -d 512 -r 3000000 -c 1000 --threads 50 -h <server_host_address> -n 50000000

Results (Average of 3 runs):

Without shared query buffer:
- Throughput: 208,361 operations per second
- Average latency: 4.806 milliseconds
With shared query buffer:
- Throughput: 214,062 operations per second
- Average latency: 4.692 milliseconds

PingXie

This change LTGM overall. Thanks @uriyage

src/server.c

src/networking.c

src/server.c

src/networking.c

zuiderkwast

Generally LGTM

src/networking.c

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

madolson

I'm good with it. @uriyage Can you do the merge to get it up to date?

src/networking.c

codecov · 2024-05-13T09:27:51Z

Codecov Report

Attention: Patch coverage is 96.22642% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 70.25%. Comparing base (045d475) to head (05224a5).
Report is 3 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #258      +/-   ##
============================================
+ Coverage     69.99%   70.25%   +0.25%     
============================================
  Files           109      109              
  Lines         59913    59957      +44     
============================================
+ Hits          41938    42122     +184     
+ Misses        17975    17835     -140

Files	Coverage Δ
src/replication.c	`87.21% <100.00%> (+<0.01%)`	⬆️
src/server.c	`88.93% <100.00%> (+0.02%)`	⬆️
src/networking.c	`85.40% <95.23%> (+0.27%)`	⬆️

... and 16 files with indirect coverage changes

soloestoy · 2024-05-13T12:06:20Z

@uriyage can you give more tests about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

uriyage · 2024-05-13T17:56:00Z

@uriyage can you give move test results about the cases that the whole command or argv exceed PROTO_IOBUF_LEN?

I checked with 24K values (since with jemalloc we actually allocate more than 16K).

Commands:

./valkey-server --protected-mode no --save

./valkey-benchmark -t set -d 24000 -r 1000 -c 500 --threads 4 -h 192.31.233.204 -n 10000000

I get 83,544 without a shared query buffer and 83,238 with a shared query buffer.

So, essentially about the same, which is expected since with larger values, we revert to the current logic where we use a private buffer per client.

uriyage · 2024-05-14T06:24:47Z

I'm good with it. @uriyage Can you do the merge to get it up to date?

@madolson Done

madolson · 2024-05-14T16:57:10Z

@soloestoy Do you have any further followup?

tests/unit/querybuf.tcl

soloestoy · 2024-05-15T08:55:02Z

I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits.

uriyage · 2024-05-27T11:48:07Z

I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits.

@soloestoy once we exceed the initial read size for the shared query buffer, we need to transfer the data to the client's private buffer. Leaving the data in the shared buffer isn't an option, therefore, the subsequent read will be anyway directed to the client's private query buffer.

src/networking.c

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

madolson

Style comments from clang format change.

src/server.c

src/networking.c

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after #258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR #258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>

valkey-io#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after valkey-io#258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR valkey-io#258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>

redis#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after valkey-io/valkey#258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR #258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>

This PR is based on the commits from PR valkey-io/valkey#258, valkey-io/valkey#593, valkey-io/valkey#639 This PR optimizes client query buffer handling in Redis by introducing a reusable query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer. The primary changes are: * Adding a reusable query buffer `thread_shared_qb` that clients use by default. * Modifying client querybuf initialization and reset logic. * Freeing idle client query buffers when empty to allow reuse of the reusable query buffer. * Master client query buffers are kept private as their contents need to be preserved for replication stream. * When nested commands is executed, only the first user uses the reuse buffer, and subsequent users will still use the private buffer. In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients. The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client. This PR is different from valkey-io/valkey#258 1. When a client is in the mid of requiring a reused buffer and returning it, regardless of whether the query buffer has changed (expanded), we do not update the reused query buffer in the middle, but return the reused query buffer (expanded or with data remaining) or reset it at the end. 2. Adding a new thread variable `thread_shared_qb_used` to avoid multiple clients requiring the reusable query buffer at the same time. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: oranagra <oran@redislabs.com>

uriyage force-pushed the shared_qb branch 2 times, most recently from 6ce2ea4 to fc9ee81 Compare April 8, 2024 11:48

madolson requested review from madolson and zuiderkwast April 8, 2024 15:57

madolson reviewed Apr 22, 2024

View reviewed changes

PingXie reviewed May 6, 2024

View reviewed changes

src/server.c Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/server.c Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

zuiderkwast approved these changes May 8, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

uriyage and others added 5 commits May 9, 2024 05:28

use shared query buffer for clients read

5617098

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

renaming and add test validation

44db6e2

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Update src/replication.c

5592840

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Chack for shared qb in freeClient

dd182c5

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

PR changes

0b34557

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

uriyage force-pushed the shared_qb branch from 2995d28 to 0b34557 Compare May 9, 2024 05:28

madolson approved these changes May 9, 2024

View reviewed changes

madolson added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge labels May 9, 2024

Merge branch 'unstable' into shared_qb

41b7c79

enjoy-binbin reviewed May 13, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

uriyage added 2 commits May 13, 2024 16:25

Handle the case where thread_shared_qb is not initialize

2041fc3

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

assert client is using shared buffer in resetSharedQueryBuf

7f7f618

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

soloestoy reviewed May 15, 2024

View reviewed changes

tests/unit/querybuf.tcl Show resolved Hide resolved

Merge branch 'unstable' into shared_qb

cdb8023

madolson reviewed May 28, 2024

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

Update src/networking.c

1ac1390

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

madolson reviewed May 28, 2024

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

Make clang format happy

05224a5

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

madolson merged commit fd58b73 into valkey-io:unstable May 28, 2024
18 checks passed

madolson removed the to-be-merged Almost ready to merge label May 28, 2024

uriyage mentioned this pull request Jun 3, 2024

Adjust query buffer resized correctly test to non-jemalloc allocators. #593

Merged

This was referenced Aug 22, 2024

Introduce reusable query buffer for client reads sundb/redis#337

Open

Introduce reusable query buffer for client reads redis/redis#13488

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce shared query buffer for client reads #258

Introduce shared query buffer for client reads #258

uriyage commented Apr 8, 2024

madolson left a comment

uriyage commented May 5, 2024

PingXie left a comment

zuiderkwast left a comment

madolson left a comment

codecov bot commented May 13, 2024 •

edited

Loading

soloestoy commented May 13, 2024 •

edited

Loading

uriyage commented May 13, 2024

uriyage commented May 14, 2024

madolson commented May 14, 2024

soloestoy commented May 15, 2024

uriyage commented May 27, 2024

madolson left a comment

Introduce shared query buffer for client reads #258

Introduce shared query buffer for client reads #258

Conversation

uriyage commented Apr 8, 2024

madolson left a comment

Choose a reason for hiding this comment

uriyage commented May 5, 2024

PingXie left a comment

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

codecov bot commented May 13, 2024 • edited Loading

Codecov Report

soloestoy commented May 13, 2024 • edited Loading

uriyage commented May 13, 2024

uriyage commented May 14, 2024

madolson commented May 14, 2024

soloestoy commented May 15, 2024

uriyage commented May 27, 2024

madolson left a comment

Choose a reason for hiding this comment

codecov bot commented May 13, 2024 •

edited

Loading

soloestoy commented May 13, 2024 •

edited

Loading