-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce shared query buffer for client reads #258
Conversation
6ce2ea4
to
fc9ee81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me. Can you document the procedure you were using when you saw the 3% performance boost? Did we also test it for pipelined clients?
I would also ideally like some protection in freeClient() to make sure if a client gets freed it somehow doesn't also free the shard query buffer.
Performance Benchmark: Server Setup:
Benchmark Configuration:
Server Command:
Benchmark Command:
Results (Average of 3 runs):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LTGM overall. Thanks @uriyage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with it. @uriyage Can you do the merge to get it up to date?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #258 +/- ##
============================================
+ Coverage 69.99% 70.25% +0.25%
============================================
Files 109 109
Lines 59913 59957 +44
============================================
+ Hits 41938 42122 +184
+ Misses 17975 17835 -140
|
@uriyage can you give more tests about the cases that the whole command or argv exceed |
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
I checked with 24K values (since with jemalloc we actually allocate more than 16K). Commands:
I get 83,544 without a shared query buffer and 83,238 with a shared query buffer. So, essentially about the same, which is expected since with larger values, we revert to the current logic where we use a private buffer per client. |
@soloestoy Do you have any further followup? |
I'm thinking if we can eliminate the limit of shared query buffer (and resize in cron if needed), maybe we can get more benefits. |
@soloestoy once we exceed the initial read size for the shared query buffer, we need to transfer the data to the client's private buffer. Leaving the data in the shared buffer isn't an option, therefore, the subsequent read will be anyway directed to the client's private query buffer. |
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style comments from clang format change.
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after #258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR #258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>
valkey-io#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after valkey-io#258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR valkey-io#258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>
redis#593) Test `query buffer resized correctly` start to fail (https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after valkey-io/valkey#258 PR. With Jemalloc we allocate ~20K for the query buffer, in the test we read 1 byte in the first read, in the second read we make sure we have at least 16KB free place in the query buffer and we have as Jemalloc allocated 20KB, But with non jemalloc we allocate in the first read exactly 16KB. in the second read we check and see that we don't have 16KB free space as we already read 1 byte hence we reallocate this time greedly (*2 of the requested size of 16KB+1) hence the test condition that the querybuf size is < 32KB is no longer true The `query buffer resized correctly test` starts [failing](https://github.com/valkey-io/valkey/actions/runs/9278013807) with non-jemalloc allocators after PR #258 . With jemalloc, we allocate ~20KB for the query buffer. In the test, we read 1 byte initially and then ensure there is at least 16KB of free space in the buffer for the second read, which is satisfied by jemalloc's 20KB allocation. However, with non-jemalloc allocators, the first read allocates exactly 16KB. When we check again, we don't have 16KB free due to the 1 byte already read. This triggers a greedy reallocation (doubling the requested size of 16KB+1), causing the query buffer size to exceed the 32KB limit, thus failing the test condition. This PR adjusted the test query buffer upper limit to be 32KB +2. Signed-off-by: Uri Yagelnik <uriy@amazon.com>
This PR is based on the commits from PR valkey-io/valkey#258, valkey-io/valkey#593, valkey-io/valkey#639 This PR optimizes client query buffer handling in Redis by introducing a reusable query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer. The primary changes are: * Adding a reusable query buffer `thread_shared_qb` that clients use by default. * Modifying client querybuf initialization and reset logic. * Freeing idle client query buffers when empty to allow reuse of the reusable query buffer. * Master client query buffers are kept private as their contents need to be preserved for replication stream. * When nested commands is executed, only the first user uses the reuse buffer, and subsequent users will still use the private buffer. In addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients. The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client. This PR is different from valkey-io/valkey#258 1. When a client is in the mid of requiring a reused buffer and returning it, regardless of whether the query buffer has changed (expanded), we do not update the reused query buffer in the middle, but return the reused query buffer (expanded or with data remaining) or reset it at the end. 2. Adding a new thread variable `thread_shared_qb_used` to avoid multiple clients requiring the reusable query buffer at the same time. --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: Madelyn Olson <matolson@amazon.com> Co-authored-by: Uri Yagelnik <uriy@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: oranagra <oran@redislabs.com>
This PR optimizes client query buffer handling in Valkey by introducing a shared query buffer that is used by default for client reads. This reduces memory usage by ~20KB per client by avoiding allocations for most clients using short (<16KB) complete commands. For larger or partial commands, the client still gets its own private buffer.
The primary changes are:
shared_qb
that clients use by defaultIn addition to the memory savings, this change shows a 3% improvement in latency and throughput when running with 1000 active clients.
The memory reduction may also help reduce the need to evict clients when reaching max memory limit, as the query buffer is the main memory consumer per client.