Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Support LZ4 compression #1223

Open
hpatro opened this issue Oct 24, 2024 · 9 comments
Open

[NEW] Support LZ4 compression #1223

hpatro opened this issue Oct 24, 2024 · 9 comments

Comments

@hpatro
Copy link
Contributor

hpatro commented Oct 24, 2024

The problem/use-case that the feature addresses

Faster BGSAVE time and smaller RDB snapshot size

Description of the feature

Support LZ4 compression as replacement of LZF compression for large in-memory data payload and RDB data. This should result in faster snapshot time as well as smaller RDB files. I had done experimentation in the past with the following lz4 library and the performance improvement was really good with dummy benchmark data.

Data generation:

src/redis-benchmark -t set -n 1000000 -q -d 20000 -r 1000000

Performance:

Compression Type BGSAVE time (sec) RDB size (Mb) load time (sec)
LZ4 1089 8612.88 18.918
LZF 1251 12405.38 32.34

It can be further fine tuned (I've not explored all the modes provided in the library).

The challenge we need to consider is supporting LZF compressed RDB data to be read (forever?) for backwards compatibliity if we decide to move to a new compression library.

@zuiderkwast
Copy link
Contributor

IMO: We can add it when we bump the RDB version. This time we may want to change the magic string too since Redis already has defined RDB 12, so Valkey and Redis will diverge.

Q: Shall we vendor lz4?

@hpatro
Copy link
Contributor Author

hpatro commented Nov 4, 2024

IMO: We can add it when we bump the RDB version. This time we may want to change the magic string too since Redis already has defined RDB 12, so Valkey and Redis will diverge.

Yeah, that's a must. Based on RDB version we decompress data accordingly.

Q: Shall we vendor lz4?

I think we should avoid that. git submodule is an option unless we need to make any changes to it.

@zuiderkwast
Copy link
Contributor

I'm skeptical to submodules. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL).

I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries. According to https://github.com/lz4/lz4/tree/dev/lib#readme it seems to be only one or a few files, depending on what we need.

@hpatro
Copy link
Contributor Author

hpatro commented Nov 5, 2024

I'm skeptical to submodules. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL).

I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries. According to https://github.com/lz4/lz4/tree/dev/lib#readme it seems to be only one or a few files, depending on what we need.

I did that while experimenting but it's more maintenance work. I felt Ping generally had a stance of not vendoring packages into Valkey. Thoughts @PingXie ?

@zuiderkwast
Copy link
Contributor

I do agree we should de-vendor large deps like Lua and Jemalloc and not vendor such deps in the future. For these, we can require them to be installed in the system, just like we do with OpenSSL, rather than trying to provide them.

For single file deps, that approach doesn't make much sense IMO. Small libs are often copied under src/, not even under deps/. Examples: crc64, lzf, MT19937-64 and siphash. The maintenance burden is not a problem for these, as far as I've seen. Not having them vendored would be a hazzle though.

@hpatro
Copy link
Contributor Author

hpatro commented Nov 5, 2024

I do agree we should de-vendor large deps like Lua and Jemalloc and not vendor such deps in the future. For these, we can require them to be installed in the system, just like we do with OpenSSL, rather than trying to provide them.

For single file deps, that approach doesn't make much sense IMO. Small libs are often copied under src/, not even under deps/. Examples: crc64, lzf, MT19937-64 and siphash. The maintenance burden is not a problem for these, as far as I've seen. Not having them vendored would be a hazzle though.

Do we get notified if there is a vulnerability discovered/patched in these libs you mentioned above?

@zuiderkwast
Copy link
Contributor

No, only if we watch them in some way. Do we get notified if we use them as a submodule?

@asafpamzn
Copy link

asafpamzn commented Nov 6, 2024

In the example above, the value size used is 20KB, a size where compression is effective. However, for users working with smaller values—such as those under 256 bytes—compression algorithms are unlikely to be as effective.

If we are to modify the RDB protocol, I suggest compressing the RDB data in fixed-size chunks, such as 32KB, rather than compressing each value individually. This approach should improve the compression ratio and potentially reduce load times.

@hpatro
Copy link
Contributor Author

hpatro commented Nov 6, 2024

In the example above, the value size used is 20KB, a size where compression is effective. However, for users working with smaller values—such as those under 256 bytes—compression algorithms are unlikely to be as effective.

If we are to modify the RDB protocol, I suggest compressing the RDB data in fixed-size chunks, such as 32KB, rather than compressing each value individually. This approach should improve the compression ratio and potentially reduce load times.

Definitely, for RDB compression/decompression, this would make sense.

For large objects stored in memory, if we are to apply compression, we would need to stick to the existing approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants