-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-4514: Add Codec for ZStandard Compression #2267
Conversation
This comment has been minimized.
This comment has been minimized.
@@ -69,6 +69,14 @@ public Constructor get() throws ClassNotFoundException, NoSuchMethodException { | |||
} | |||
}); | |||
|
|||
private static MemoizingConstructorSupplier zStd4OutputStreamSupplier = new MemoizingConstructorSupplier(new ConstructorSupplier() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no context on this patch but I'm guessing you have a typo:
currently: zStd4OutputStreamSupplier
probably intended: zStdOutputStreamSupplier
You have a 4 in the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my. Got it.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2 big questions here.
|
Also, thanks for working on this. Did you have a chance to run any performance comparisons? |
Thanks for the PR. To answer @tgravescs's questions:
Performance numbers would definitely help the discussion. |
Thanks for your advice. Let me summarize: I will conduct a test benchmark a couple of days, and submit a proposal on KIP with the results. From there, let's discuss the following topics there:
I just started configuring benchmark environment on AWS. Please tune in! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
54920d3
to
13519e4
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
13519e4
to
f83ca2b
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
f83ca2b
to
9c48ad8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
28e1ed9
to
4cf6695
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This patch adds support for zstandard compression to Kafka as documented in KIP-110: https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression. Reviewers: Ivan Babrou <ibobrik@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Merged to trunk and 2.1. Thanks again for your persistence! Great contribution! |
Thanks for your contribution @dongjinleekr! Is https://twitter.com/dongjinleekr, your Twitter handle? |
PR #2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types. Passed the affected, previously-failing test: `ducker-ak test tests/kafkatest/tests/client/compression_test.py` Reviewers: Jason Gustafson <jason@confluent.io>
PR #2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types. Passed the affected, previously-failing test: `ducker-ak test tests/kafkatest/tests/client/compression_test.py` Reviewers: Jason Gustafson <jason@confluent.io>
Was a param introduced to adjust the compression level? |
I hope so. Half the power of Zstd is its huge range of compression levels
on the pareto frontier.
…On Tue, Nov 13, 2018, 09:45 davewat ***@***.*** wrote:
Was a param introduced to adjust the compression level?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb7nskilsQmwWkXo_NZOYIl5Qw057ks5uuwVHgaJpZM4LPBAI>
.
|
@dongjinleekr where is the compression level specified? I can't seem to locate it, even in source. Right now I am getting better compression with gzip still, and want to adjust the level. Thanks. |
It would only need to exist and be configured on the producer. Its
irrelevant to the consumer or broker or wire protocol. Therefore it isnt
in the KIP or specification, and is not related to compatability.
…On Wed, Nov 14, 2018, 06:45 davewat ***@***.*** wrote:
@dongjinleekr <https://github.com/dongjinleekr> where is the compression
level specified? I can't seem to locate it, even in source. Right now I am
getting better compression with gzip still, and want to adjust the level.
Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb_54duoFa9MfwEJZFbaascNVvrWGks5uvCyAgaJpZM4LPBAI>
.
|
@scottcarey while it is an anti-pattern, the broker can compress/recompress messages depending on the the topic's |
Producer configs have to be in the KIP (any config is considered public API). I don't think there's a way to configure the compression level at this point. If someone wants to contribute that, it would make to allow it for other compression types too. |
IIRC this was disallowed for zstd.
…On Wed, Nov 14, 2018, 08:21 Elias Levy ***@***.*** wrote:
@scottcarey <https://github.com/scottcarey> while it is an anti-pattern,
the broker can compress/recompress messages depending on the the topic's
compression.type configuration, which could be set to zstd .
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb6Qh7uWifh-g4dmuGvve64EulvFIks5uvEMXgaJpZM4LPBAI>
.
|
I may be mistaken. At least some of this was disallowed for zstd when
coverting to clients that dont have zstd support. That is distinct from
broker recompression.
A broker could recompress and the level it does so is a major factor in
broker cpu use if so.
…On Wed, Nov 14, 2018, 08:21 Elias Levy ***@***.*** wrote:
@scottcarey <https://github.com/scottcarey> while it is an anti-pattern,
the broker can compress/recompress messages depending on the the topic's
compression.type configuration, which could be set to zstd .
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb6Qh7uWifh-g4dmuGvve64EulvFIks5uvEMXgaJpZM4LPBAI>
.
|
Setting the topic config to zstd while not compressing in the producer is allowed for all compression algorithms. |
Yes i got automatic broker downconversion confused with topic compression.
I was under the impression both were disabled for zstd but only one is.
…On Wed, Nov 14, 2018, 08:32 Ismael Juma ***@***.*** wrote:
Setting the topic config to zstd while not compressing in the producer is
allowed for all compression algorithms.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb-HdQwblXEQw_7r0gr9dbICoe7nWks5uvEW1gaJpZM4LPBAI>
.
|
@scottcarey I wouldn't say it's irrelevant on the broker. We have older producers that don't/ won't speak zstd, but our (testing) Brokers force zstd on the topic for storage/ space requirements. The broker's ability to set the compression level is key in our use case. I just can't figure out where in this PR the level is set - though now I am thinking we are just utilizing the default (3). |
@davewat @scottcarey @ijuma Sorry for the late reply. In fact, I already investigated this issue (support compression level for zstd) but I concluded that it would be much better to put this issue into separated one and focus on implementing the ZSTD feature only. Here is why: All compression codecs (i.e., Gzip, Snappy, LZ4 and ZSTD) support some parameters to change the degree of compression; However, only LZ4 and ZSTD supports the concept of 'level' - in the case of GZIP and Snappy, they require the block size parameter, not 'level'. To make the compression level feature available, we must modify the API signatures of How about your opinion? Do you really need it? Does it sound reasonable? If the answer is Yes, please file it to Jira with 'needs-kip' tag; Then, I will take the issue - I will make the proposal. Footnotes
|
@dongjinleekr , zstd also supports negative levels for faster compression, it's equivalent of the --fast X in the CLI |
@dongjinleekr our use case needs it, not sure if we are just a corner case. I have created the issue in Jira. Thanks! |
Gzip supports levels 1 to 9. The api in java hides it a bit, its has a
very significant impact on cpu use when compressing. I suspect much of the
reason that recompressing broker side has a bad rap is because it was done
with the default level of 6. Level 1 is about 10x as fast. Zstd lv 3 is
faster than that at higher compression rates, however.
Being able to tune the compression level is very important. Compression is
all about CPU to I/O tradeoffs, and what tradeoff is the best is use case
dependent. Zstd ranges from snappy-like speeds and compression levels to
lzma-like compression levels. Setting compression levels will get even
more important if dictionary support is added.
…On Thu, Nov 15, 2018, 07:51 Lee Dongjin ***@***.*** wrote:
@davewat <https://github.com/davewat> @scottcarey
<https://github.com/scottcarey> @ijuma <https://github.com/ijuma> Sorry
for the late reply. In fact, I already investigated this issue (support
compression level for zstd) but I concluded that it would be much better to
put this issue into separated one and focus on implementing the ZSTD
feature only.
Here is why: All compression codecs (i.e., Gzip, Snappy, LZ4 and ZSTD)
support some parameters to change the degree of compression; However, only
LZ4 and ZSTD supports the concept of 'level' - in the case of GZIP and
Snappy, they require the block size parameter, not 'level'.
To make the compression level feature available, we must modify the API
signatures of MemoryRecordsBuilder to support compression level, and add
some validation logic[^1]. It requires additional modifications to read
ProducerConfig value and pass it into MemoryRecordsBuilder. Of course,
this work requires a bunch of modifications and some policies on various
codecs. It is why I decided to put off this issue and use the default level
for ZSTD, that is, 3.
How about your opinion? Do you really need it? Does it sound reasonable?
*If the answer is Yes, please file it to Jira with 'needs-kip' tag; Then,
I will take the issue - I will make the proposal.*
[^1]: Check whether given CompressionCodec supports the concept of
'level,' and whether given compression level is valid for the
CompressionCodec. (e.g., ZSTD supports 22 levels but LZ4 supports 4
levels only.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2267 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABJJb5-43B16StZuZxPXMwzcGAYTUhHrks5uvY2JgaJpZM4LPBAI>
.
|
When I added zstd support for Go library, I also added compression level support: It applies to zstd and gzip. |
@scottcarey @luben Thank you for the correction. Right, ZSTD now supports negative compression level (#1 #2) and GZIP is also able to use compression levels, although it is blocked in the official API - but we can make use of it with some workaround. These features should be supported in the implementation. @davewat Thank you for filing the issue. I just updated the issue applying the comments here and now working on the KIP. I will give you slack when I complete the document and open the discussion thread. @bobrik Great. Sarama always guides us the direction! I will include sarama's case in the KIP. |
@davewat @scottcarey @eliaslevy @ijuma I just opened the discussion thread in dev mailing list. Let's continue the discussion there. |
@dongeforever , just released zstd-jni-1.3.7-2 with support to query min/max compression levels |
This patch adds support for zstandard compression to Kafka as documented in KIP-110: https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression. Reviewers: Ivan Babrou <ibobrik@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
PR apache#2267 Introduced support for Zstandard compression. The relevant test expects values for `num_nodes` and `num_producers` based on the (now-incremented) count of compression types. Passed the affected, previously-failing test: `ducker-ak test tests/kafkatest/tests/client/compression_test.py` Reviewers: Jason Gustafson <jason@confluent.io>
Hello. This PR resolves KAFKA-4514: Add Codec for ZStandard Compression. Please have a look when you are free. Since I am a total newbie of Apache Kafka, feel free to point out the deficiencies.
Add to the feature itself, I have a question: Should we support an option for ZStandard compression level?
According to ZStandard official documentation, it supports compression level of 1 ~ 22. Because of that, Hadoop added a new configuration option named "io.compression.codec.zstd.level", whose default value is 3. In this PR, I configured the compression level to 1 as a temporary one but wondering following problems:
I am looking forward to your advice. Thanks.