Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed key into dict entry #541

Merged
merged 22 commits into from
Jul 2, 2024
Merged

Conversation

hpatro
Copy link
Contributor

@hpatro hpatro commented May 23, 2024

This PR incorporates changes related to key embedding described in the redis/redis#12216
With this change there will be no key pointer and embedded the key within the dictEntry. 1 byte is used for additional bookkeeping. Overall the saving would be 7 bytes.

Key changes:

New dict entry type introduced, which is now used as an entry for the main dictionary:

typedef struct {
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;  /* Next entry in the same hash bucket. */
    uint8_t key_header_size; /* offset into key_buf where the key is located at. */
    unsigned char key_buf[]; /* buffer with embedded key. */
} embeddedDictEntry;

One new function has been added to the dictType:

size_t (*embedKey)(unsigned char *buf, size_t buf_len, const void *key, unsigned char *header_size);

Change is opt-in per dict type, hence sets, hashes and other types that are using dictionary are not impacted.
With this change main dictionary now owns the data, so copy on insert in dbAdd is no longer needed.

Benchmarking results

TLDR; Around 9-10% memory usage reduction in overall memory usage for scenario with key of 16 bytes and value of 8 bytes and 16 bytes. The throughput per second varies but is similar or greater in most of the run(s) with the changes against unstable (ae2d421).

  • Performed on a Amazon EC2 c5.4xlarge instance
  • Server setup (maximum memory allocated is 100 MB)
src/valkey-server --save "" --daemonize yes --maxmemory 100m --enable-debug-command local --port 6379
  • GET command used
src/valkey-benchmark  -t get  -n 1000000  -r 10000000
  • SET command used
src/valkey-benchmark  -t get  -n 1000000  -r 10000000 -d 16

SET performance

  Throughput per sec Number of keys Used memory (bytes) Throughput per sec Number of keys Used memory (bytes) Throughput per sec Number of keys Used memory (bytes)
Key Embedding                  
SET (d = 8, n= 1M, r=10M) 108448.11 951737 78140504 107135.21 951660 78135144 108026.36 951616 78132160
SET (d = 16, n= 1M, r=10M) 106974.76 951677 85750152 107342.21 951611 85745056 107215.62 951757 85754792
                   
Unstable (ae2d421)                  
SET (d = 8, n= 1M, r=10M) 106929 951556 85715672 107009.09 951671 85725056 106484.94 951864 85740680
SET (d = 16, n= 1M, r=10M) 104036.62 951541 93226392 105965.88 951710 93341232 104123.28 951627 93333560

GET performance

Operations Number of Keys Used memory Throughput per sec Throughput per sec Throughput per sec
           
Key Embedding          
GET (d = 8, n= 1M, r=10M) 951580 78079272 105797.72 106168.39 107135.21
GET (d = 16, n= 1M, r=10M) 951764 85731504 105797.72 106416.94 106202.2
           
Unstable (ae2d421)          
GET (d = 8, n= 1M, r=10M) 951213 85662008 106145.85 105685.91 105741.78
GET (d = 16, n= 1M, r=10M) 951646 93260288 106723.59 106089.55 106349.04

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Copy link

codecov bot commented May 23, 2024

Codecov Report

Attention: Patch coverage is 96.87500% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.32%. Comparing base (7719dbb) to head (825c82f).
Report is 10 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #541      +/-   ##
============================================
+ Coverage     70.26%   70.32%   +0.06%     
============================================
  Files           110      111       +1     
  Lines         60108    60286     +178     
============================================
+ Hits          42234    42396     +162     
- Misses        17874    17890      +16     
Files Coverage Δ
src/db.c 88.39% <100.00%> (+0.06%) ⬆️
src/defrag.c 88.24% <100.00%> (+0.68%) ⬆️
src/dict.c 97.53% <100.00%> (+0.09%) ⬆️
src/kvstore.c 96.17% <100.00%> (-0.59%) ⬇️
src/rdb.c 75.80% <100.00%> (-0.09%) ⬇️
src/sds.c 86.08% <100.00%> (+0.21%) ⬆️
src/sds.h 78.68% <ø> (ø)
src/server.c 88.57% <100.00%> (+0.01%) ⬆️
src/debug.c 53.40% <0.00%> (ø)
src/object.c 78.55% <50.00%> (-0.03%) ⬇️

... and 21 files with indirect coverage changes

@zuiderkwast zuiderkwast self-requested a review May 24, 2024 10:53
@hpatro
Copy link
Contributor Author

hpatro commented Jun 11, 2024

@valkey-io/core-team Could you provide some clarity on this ?

@hpatro
Copy link
Contributor Author

hpatro commented Jun 11, 2024

My thought captured under #394 (comment)

We've made changes in the past which can benefit the users in a short term and have redone the implementation in the following major/minor version. key embedding in dictEntry is a pretty small change and we can easily get rid of it with dictEntry removal. I feel the small gain is worth it with this minimal change (7 bytes) for Valkey 8.0 and invest on the kvpair object structure with open addressing in 9.0

@madolson madolson added the major-decision-pending Major decision pending by TSC team label Jun 12, 2024
@zuiderkwast
Copy link
Contributor

Yeah, I've thought about this many times. The complexity is not too bad. The technique of embedding an sds string can later be reused in other places.

Although the embedding is abstracted, it only every makes sense for sds strings. That's fine though. Dict doesn't include "sds.h" so it's decoupled.

I'm in favor. (I'll add some review comments later, just minor things.)

The reason I've been skeptical before is that I'd rather like that we invest in embedding key in robj, since that'd be beneficial in the future redesign (#169), but since this PR is already ready and the robj work is not started, I think we can merge this for Valkey 8.

@bbarani
Copy link

bbarani commented Jun 21, 2024

@hpatro @madolson @zuiderkwast Is this change targeted for Valkey 8? If so, can we add it to Valkey 8 project board?

@madolson
Copy link
Member

I was waiting on performance numbers from Hari before officially adding it.

@bbarani
Copy link

bbarani commented Jun 21, 2024

I was waiting on performance numbers from Hari before officially adding it.

@hpatro Do you have performance numbers? Can you please add it to this issue to move forward with next steps?

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a risky PR IMO. I am concerned about the mixed use of various dict entry types while using dictEntry as the "universal" pointer. I don't think it is feasible for me to examine every use of dictEntry, which should've been compiler's job. I am happy to help out on the refactoring if needed. Let me know.

src/dict.c Show resolved Hide resolved
src/dict.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/dict.c Show resolved Hide resolved
src/dict.c Outdated Show resolved Hide resolved
src/dict.h Outdated Show resolved Hide resolved
src/object.c Outdated Show resolved Hide resolved
@@ -509,6 +544,8 @@ dictEntry *dictInsertAtPosition(dict *d, void *key, void *position) {
/* Allocate an entry without value. */
entry = createEntryNoValue(key, *bucket);
}
} else if (d->type->embedded_entry) {
entry = createEmbeddedEntry(key, *bucket, d->type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way dictEntry is used has no type safety. Admittedly, this is not a new issue but the addition of embeddedDictEntry is making it worse. The following code path looks problematic to me. can you please double check?

setGenericComamd() -> setKey() -> dbAdd() -> dbAddInsternal() -> kvstoreDictSetKey() -> dictSetKey()

I don't seedictSetKey getting patched to handle this new embedded dict entry.

I would suggest making dictEntry an opaque struct next and force every function go through an inline accessor function/macro. I think this is the only certain way to ensure we don't accidentally use the wrong data type.

Copy link
Member

@madolson madolson Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the dictEntry already opaque?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in dict.c.

Copy link
Member

@madolson madolson Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't agree it should be opaque in dict.c. It seems like a very small number of actual touch points we actually have to make sure are correct. We could even reduce the number. I'm not sure making it opaque will help all that much.

Discussed offline, Ping has a separate proposal for adding guardrails that he will publish.

src/defrag.c Outdated Show resolved Hide resolved
src/dict.c Show resolved Hide resolved
src/dict.h Outdated Show resolved Hide resolved
@hpatro
Copy link
Contributor Author

hpatro commented Jun 24, 2024

Thanks for the review @PingXie and @zuiderkwast . I will shortly address them.

@madolson I've posted the benchmarking results on the top comment.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think overall this is a good iterative improvement before we can make more changes later. I would like to see us embed the key into the robj in the future, so this has a lot of the same primitives as that.

src/dict.c Outdated Show resolved Hide resolved
src/dict.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@hpatro
Copy link
Contributor Author

hpatro commented Jun 25, 2024

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hwware
Copy link
Member

hwware commented Jun 25, 2024

it looks like used memory decreases 10% without changing on qps.

@madolson
Copy link
Member

@hpatro AFAIK most of the folks are inclined to accept it for 8, so I would ask you to follow up if you can. I'll throw it on our Monday agenda as well to make sure we close on it quickly if we can.

@PingXie
Copy link
Member

PingXie commented Jun 26, 2024

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

@hpatro
Copy link
Contributor Author

hpatro commented Jun 26, 2024

@zuiderkwast @PingXie @madolson @valkey-io/core-team If we all are aligned on accepting this change in for 8.0, I will look into addressing the comments and polishing the PR further. Let me know.

@hpatro, I like the idea! :-) The only reason I marked this PR as "changes required" is because of the amount of type casting in dict.c (and I understand that it didn't start with this PR). This is a great improvement to be had for Valkey 8.0 but I do think we need to make some potentially painful changes to get dict.c back in shape. If you don't mind, I would love to get my hands dirty and help out with the refactoring too.

Thanks for the help @PingXie. Let me publish the changes which I have done tomorrow and we can go from there. Do we want to do further refactoring as a follow up PR or along with this?

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@PingXie
Copy link
Member

PingXie commented Jun 27, 2024

Thanks @hpatro.

Do we want to do further refactoring as a follow up PR or along with this?

I am inclined to reduce as much type casting as possible in this PR and reason being that I don't trust myself doing the compiler job (of type checking the data access). I am concerned about the potential memory corruption issues. Will dedicate some time later this week for a deep-dive.

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro
Copy link
Contributor Author

hpatro commented Jun 27, 2024

Couple of things which remain

dict.c: In function ‘dictGetNextRef’:
dict.c:930:37: error: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Werror=address-of-packed-member]
  930 |     if (entryIsEmbedded(de)) return &decodeEmbeddedEntry(de)->next;
      |                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro
Copy link
Contributor Author

hpatro commented Jun 27, 2024

@madolson I can't see any other builds apart from DCO check. Are we throttled/out of credits?

src/dict.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
src/sds.c Outdated Show resolved Hide resolved
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@madolson madolson added release-notes This issue should get a line item in the release notes major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jun 30, 2024
@madolson
Copy link
Member

I see 4/6 people directionally inclined, so going to throw on the directionally approved tag.

src/server.c Outdated Show resolved Hide resolved
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor documentation changes that would make some stuff clearer.

src/kvstore.h Outdated Show resolved Hide resolved
src/db.c Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added docs. I have a few follow-up comments on those.

src/dict.h Show resolved Hide resolved
src/db.c Outdated Show resolved Hide resolved
src/db.c Outdated Show resolved Hide resolved
src/kvstore.h Outdated Show resolved Hide resolved
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro
Copy link
Contributor Author

hpatro commented Jul 1, 2024

@zuiderkwast / @madolson Shall we ship it ? 🚀

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LGTM.

@bbarani
Copy link

bbarani commented Jul 2, 2024

Yes, LGTM.

Awesome! Thanks all.

@madolson
Copy link
Member

madolson commented Jul 2, 2024

Kicked off a full-run to stress this code a bit more before merging: https://github.com/valkey-io/valkey/actions/workflows/daily.yml

return required_keylen;
}
assert(buf_len >= required_keylen);
memcpy(buf, sdsAllocPtr(s), required_keylen);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this double information about the header size? you can extract it from the "sds", right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An sds s is a pointer to where the string content starts, so it can be used as a C string. It does not point to the start of the allocation. The header is store before the char data in the same allocation, i.e. at s[-1], s[-2], etc. The header is backwards-encoded in some way, so the byte at s[-1] says which kind of header it is and how large the header is.

       sds-----.
               |
  allocation   v
  +-----------+----------------------+
  | hdr       |string contents    \0 |
  +-----------+----------------------+

When we store this embedded, we want to be able to restore the sds pointer, so we store the size of the header size in the first. When we restore the sds pointer, we can find it using S (the offset to the string contents).

+-+-----------+----------------------+
|S| hdr       |string contents    \0 |
+-+-----------+----------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the callers of this function can call sdsHdrSize(s[-1]) themselves? I'm not sure if it's public.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, right agree. I am not so supportive of the desire to keep something as sds when it's not, but that's already in my overall comment, maybe I'll elaborate more there. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key is being used throughout the engine as a sds so changing that would be even more touchpoint. We could dynamically build it on the dictGetKey call but that would be additional penalty. Hence, storing it as a sds made the most sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question, the low 3bit in sdshdr.flag have told us the header length, and exposing function sdsHdrSize will not lead to coupling between dict.c and sds.c, I don't see any harm in it.

                                                            
 sdshdr               sds                                   
   │                  │                                     
   │                  │                                     
   ▼─────┬─────┬─────┌▼───────────────────────────────┬──┐  
   │ len │ alloc     │                                │  │  
   │     │     │flags│                                │\0│  
   └─────┼─────┼─────└────────────────────────────────┴──┘  
                                                                                                                                                                                            

@zvi-code
Copy link

zvi-code commented Jul 2, 2024

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]
  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

@zvi-code
Copy link

zvi-code commented Jul 2, 2024

This change makes sense to me overall and tradeoffs are good.

My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]

  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor.
I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

@hpatro
Copy link
Contributor Author

hpatro commented Jul 2, 2024

This change makes sense to me overall and tradeoffs are good.
My concern is with that conceptually it's not a good idea that we have 2 conflicting patterns:

  1. use the internal encoding information of sds [pointer is odd]
  2. embed sds into some other non-native allocation buffer, potentially at arbitrary offset

I feel this has very big potential for very hard to track bugs

Trying to arrange my thoughts on this. The above is just an example for a wider issue In the code design. IMO there are two types of 'sds' usages 1) A way to carry metadata about a string buffer through IO flow. 2) A compact way to encode string buffer metadata in memory with good locality and low memory overhead.

It is clear to me why use 'sds' for #2. For #1 I feel it could be wrong choice and this choice also has performance costs as it forces memory access to unpack the info many times in IO flow. For example sdsfree access the memory on free, even though it could have been avoided. This has high costs when memory access is a factor. I also think there is an alternative, to take an approach slightly similar to rust, have some struct 'runtimeSds' with sds as member and metadata, need to check if it can be passed to functions by value, to avoid the need to allocate it on the heap. Thoughts?

@zvi-code Thanks for sharing your thought. Would you mind filling a separate issue about this? Given we are planning to rehaul the hashtable implementation (#169) in 9.0. It will be good to capture some of these points.

@madolson
Copy link
Member

madolson commented Jul 2, 2024

Actual test run I kicked off: https://github.com/valkey-io/valkey/actions/runs/9764931105

@madolson madolson merged commit 8faf278 into valkey-io:unstable Jul 2, 2024
19 checks passed
@judeng
Copy link
Contributor

judeng commented Nov 2, 2024

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

@hpatro
Copy link
Contributor Author

hpatro commented Nov 6, 2024

This pr will bring us considerable cost benefits, thank you @hpatro . I have doubt about the benchmark test results, this PR should improve the cpu cache hit, but why hasn't the performance improved?

Yeah, I was also expecting more gain in throughput as well but saw a tiny gain. Anyway we were happy with the memory savings (without paying any cost). @judeng Do you have any recommendation to perform benchmarking to be able to observe the improvement in performance from CPU cache locality ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team performance release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants