-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Support LZ4 compression #1223
Comments
IMO: We can add it when we bump the RDB version. This time we may want to change the magic string too since Redis already has defined RDB 12, so Valkey and Redis will diverge. Q: Shall we vendor lz4? |
Yeah, that's a must. Based on RDB version we decompress data accordingly.
I think we should avoid that. git submodule is an option unless we need to make any changes to it. |
I'm skeptical to submodules. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL). I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries. According to https://github.com/lz4/lz4/tree/dev/lib#readme it seems to be only one or a few files, depending on what we need. |
I did that while experimenting but it's more maintenance work. I felt Ping generally had a stance of not vendoring packages into Valkey. Thoughts @PingXie ? |
I do agree we should de-vendor large deps like Lua and Jemalloc and not vendor such deps in the future. For these, we can require them to be installed in the system, just like we do with OpenSSL, rather than trying to provide them. For single file deps, that approach doesn't make much sense IMO. Small libs are often copied under src/, not even under deps/. Examples: crc64, lzf, MT19937-64 and siphash. The maintenance burden is not a problem for these, as far as I've seen. Not having them vendored would be a hazzle though. |
Do we get notified if there is a vulnerability discovered/patched in these libs you mentioned above? |
No, only if we watch them in some way. Do we get notified if we use them as a submodule? |
In the example above, the value size used is 20KB, a size where compression is effective. However, for users working with smaller values—such as those under 256 bytes—compression algorithms are unlikely to be as effective. If we are to modify the RDB protocol, I suggest compressing the RDB data in fixed-size chunks, such as 32KB, rather than compressing each value individually. This approach should improve the compression ratio and potentially reduce load times. |
Definitely, for RDB compression/decompression, this would make sense. For large objects stored in memory, if we are to apply compression, we would need to stick to the existing approach. |
The problem/use-case that the feature addresses
Faster
BGSAVE
time and smaller RDB snapshot sizeDescription of the feature
Support LZ4 compression as replacement of LZF compression for large in-memory data payload and RDB data. This should result in faster snapshot time as well as smaller RDB files. I had done experimentation in the past with the following lz4 library and the performance improvement was really good with dummy benchmark data.
Data generation:
Performance:
It can be further fine tuned (I've not explored all the modes provided in the library).
The challenge we need to consider is supporting LZF compressed RDB data to be read (forever?) for backwards compatibliity if we decide to move to a new compression library.
The text was updated successfully, but these errors were encountered: