[NEW] Support LZ4 compression #1223

hpatro · 2024-10-24T18:38:05Z

The problem/use-case that the feature addresses

Faster BGSAVE time and smaller RDB snapshot size

Description of the feature

Support LZ4 compression as replacement of LZF compression for large in-memory data payload and RDB data. This should result in faster snapshot time as well as smaller RDB files. I had done experimentation in the past with the following lz4 library and the performance improvement was really good with dummy benchmark data.

Data generation:

src/redis-benchmark -t set -n 1000000 -q -d 20000 -r 1000000

Performance:

Compression Type	BGSAVE time (sec)	RDB size (Mb)	load time (sec)
LZ4	1089	8612.88	18.918
LZF	1251	12405.38	32.34

It can be further fine tuned (I've not explored all the modes provided in the library).

The challenge we need to consider is supporting LZF compressed RDB data to be read (forever?) for backwards compatibliity if we decide to move to a new compression library.

The text was updated successfully, but these errors were encountered:

zuiderkwast · 2024-11-04T21:06:22Z

IMO: We can add it when we bump the RDB version. This time we may want to change the magic string too since Redis already has defined RDB 12, so Valkey and Redis will diverge.

Q: Shall we vendor lz4?

hpatro · 2024-11-04T21:31:23Z

IMO: We can add it when we bump the RDB version. This time we may want to change the magic string too since Redis already has defined RDB 12, so Valkey and Redis will diverge.

Yeah, that's a must. Based on RDB version we decompress data accordingly.

Q: Shall we vendor lz4?

I think we should avoid that. git submodule is an option unless we need to make any changes to it.

zuiderkwast · 2024-11-05T20:35:51Z

I'm skeptical to submodules. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL).

I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries. According to https://github.com/lz4/lz4/tree/dev/lib#readme it seems to be only one or a few files, depending on what we need.

hpatro · 2024-11-05T20:45:23Z

I'm skeptical to submodules. Offline builds get complicated. Source releases get complicated. So far, we've used either vendored dependencies or system installed ones (like OpenSSL).

I prefer that we vendor this one. We could copy only one or a few files, as we've done with crc64 and some other small libraries. According to https://github.com/lz4/lz4/tree/dev/lib#readme it seems to be only one or a few files, depending on what we need.

I did that while experimenting but it's more maintenance work. I felt Ping generally had a stance of not vendoring packages into Valkey. Thoughts @PingXie ?

zuiderkwast · 2024-11-05T22:42:12Z

I do agree we should de-vendor large deps like Lua and Jemalloc and not vendor such deps in the future. For these, we can require them to be installed in the system, just like we do with OpenSSL, rather than trying to provide them.

For single file deps, that approach doesn't make much sense IMO. Small libs are often copied under src/, not even under deps/. Examples: crc64, lzf, MT19937-64 and siphash. The maintenance burden is not a problem for these, as far as I've seen. Not having them vendored would be a hazzle though.

hpatro · 2024-11-05T22:58:19Z

I do agree we should de-vendor large deps like Lua and Jemalloc and not vendor such deps in the future. For these, we can require them to be installed in the system, just like we do with OpenSSL, rather than trying to provide them.

For single file deps, that approach doesn't make much sense IMO. Small libs are often copied under src/, not even under deps/. Examples: crc64, lzf, MT19937-64 and siphash. The maintenance burden is not a problem for these, as far as I've seen. Not having them vendored would be a hazzle though.

Do we get notified if there is a vulnerability discovered/patched in these libs you mentioned above?

zuiderkwast · 2024-11-05T23:01:48Z

No, only if we watch them in some way. Do we get notified if we use them as a submodule?

asafpamzn · 2024-11-06T14:37:19Z

In the example above, the value size used is 20KB, a size where compression is effective. However, for users working with smaller values—such as those under 256 bytes—compression algorithms are unlikely to be as effective.

If we are to modify the RDB protocol, I suggest compressing the RDB data in fixed-size chunks, such as 32KB, rather than compressing each value individually. This approach should improve the compression ratio and potentially reduce load times.

hpatro · 2024-11-06T20:57:02Z

In the example above, the value size used is 20KB, a size where compression is effective. However, for users working with smaller values—such as those under 256 bytes—compression algorithms are unlikely to be as effective.

If we are to modify the RDB protocol, I suggest compressing the RDB data in fixed-size chunks, such as 32KB, rather than compressing each value individually. This approach should improve the compression ratio and potentially reduce load times.

Definitely, for RDB compression/decompression, this would make sense.

For large objects stored in memory, if we are to apply compression, we would need to stick to the existing approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] Support LZ4 compression #1223

[NEW] Support LZ4 compression #1223

hpatro commented Oct 24, 2024

zuiderkwast commented Nov 4, 2024

hpatro commented Nov 4, 2024

zuiderkwast commented Nov 5, 2024

hpatro commented Nov 5, 2024

zuiderkwast commented Nov 5, 2024

hpatro commented Nov 5, 2024

zuiderkwast commented Nov 5, 2024

asafpamzn commented Nov 6, 2024 •

edited

Loading

hpatro commented Nov 6, 2024

[NEW] Support LZ4 compression #1223

[NEW] Support LZ4 compression #1223

Comments

hpatro commented Oct 24, 2024

Data generation:

Performance:

zuiderkwast commented Nov 4, 2024

hpatro commented Nov 4, 2024

zuiderkwast commented Nov 5, 2024

hpatro commented Nov 5, 2024

zuiderkwast commented Nov 5, 2024

hpatro commented Nov 5, 2024

zuiderkwast commented Nov 5, 2024

asafpamzn commented Nov 6, 2024 • edited Loading

hpatro commented Nov 6, 2024

asafpamzn commented Nov 6, 2024 •

edited

Loading