Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD snapshotting compression #2996

Merged

Conversation

willyborankin
Copy link
Contributor

@willyborankin willyborankin commented Apr 20, 2022

Description

This PR adds support ZSTD compression for snapshoting metadata. ZSTD compression for indexes is out of scope since such compression must be supported by Lucene.
There is a PR in Lucene apache/lucene#439 with support of ZSTD, but they do not want to merge it so far.

Issues Resolved

#2192

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 626fa4d19ab09e7a5aa4bbc690e0aa5185a67858
Log 4638

Reports 4638

@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch 2 times, most recently from a8209ef to a8c0b6a Compare May 27, 2022 21:08
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a8209ef5cd22284df5fccc51951a8d609287b875
Log 5637

Reports 5637

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a8c0b6aed85759a7dd24362637846d060743ff1b
Log 5638

Reports 5638

@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch from a8c0b6a to 32d2660 Compare May 28, 2022 09:46
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 32d2660
Log 5646

Reports 5646

@willyborankin willyborankin marked this pull request as ready for review May 28, 2022 10:24
@willyborankin willyborankin requested review from a team and reta as code owners May 28, 2022 10:24
@willyborankin willyborankin changed the title Zstdsnapshoting compression ZSTD snapshotting compression May 28, 2022
@dblock dblock requested review from andrross, mch2 and nknize June 13, 2022 23:43
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cursory look at this looks good to me.

How does one use it?

I'd like @nknize to take a look please.

@willyborankin
Copy link
Contributor Author

willyborankin commented Jun 14, 2022

A cursory look at this looks good to me.

How does one use it?

I'd like @nknize to take a look please.

During creation of a repository set compression type (default is deflate, how it works now):

{
  ...
   "settings": {
      "compress": true,
      "compression_type": "zstd", // `deflate`
   }
}

@@ -182,4 +182,6 @@ grant {
permission java.io.FilePermission "/sys/fs/cgroup/memory", "read";
permission java.io.FilePermission "/sys/fs/cgroup/memory/-", "read";

// ZSTD permissions
permission java.lang.RuntimePermission "*";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( Please fix that to have narrowed set of permissions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; we can't have the security here be wide open

Copy link
Contributor Author

@willyborankin willyborankin Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a bit complex thing. What would be better:

  • change it to java.lang.RuntimePermission "loadLibrary.*"
  • Alternative solution is to re-pack zstd library and add *.so files in the libs folder?

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! Thanks for doing this. I left some comments.

I think we should add a none compression_type option to the DSL API that nulls out the compressor instead of adding a NullCompressor:

{
  ...
   "settings": {
      "compress": true,
      "compression_type": "zstd", // `deflate`, `lz4`, `none`
   }
}

We also need to widen the test coverage to include LZ4 and no compression.

*
* @opensearch.internal
*/
public class NullCompressor implements Compressor {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompressorFactory.compressor is nullable, why do we need an explicit null compressor? Can we just check compressor == null ? indexOutputOutputStream : compressor.threadLocalOutputStream(indexOutputOutputStream) and do away with this empty class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH it is just a habit to use Null class and do not check it on null every where, but I think with None it would be better

// It needs to be different from other compressors and to not be specific
// enough so that no stream starting with these bytes could be detected as
// a XContent
private static final byte[] HEADER = new byte[] { 'Z', 'S', 'T', 'D', '\0' };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readability nit:

Suggested change
private static final byte[] HEADER = new byte[] { 'Z', 'S', 'T', 'D', '\0' };
public static final String NAME = "ZSTD";
private static final byte[] HEADER =NAME.getBytes();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I just wanted to do it as it done for DEFLATE { 'D', 'F', 'L', '\0' } not sure about \0

@@ -182,4 +182,6 @@ grant {
permission java.io.FilePermission "/sys/fs/cgroup/memory", "read";
permission java.io.FilePermission "/sys/fs/cgroup/memory/-", "read";

// ZSTD permissions
permission java.lang.RuntimePermission "*";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; we can't have the security here be wide open

final boolean compress = randomBoolean();
settingsBuilder.put("compress", compress);
if (compress) {
settingsBuilder.put("compression_type", randomFrom(CompressorType.values()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test no compression or LZ4 since CompressorType doesn't contain those values.

Copy link
Contributor Author

@willyborankin willyborankin Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to add LZ4 as a separate PR if you do not mind.

@willyborankin
Copy link
Contributor Author

willyborankin commented Jul 7, 2022

Looking pretty good! Thanks for doing this. I left some comments.

I think we should add a none compression_type option to the DSL API that nulls out the compressor instead of adding a NullCompressor:

{
  ...
   "settings": {
      "compress": true,
      "compression_type": "zstd", // `deflate`, `lz4`, `none`
   }
}

We also need to widen the test coverage to include LZ4 and no compression.

Yes none much better than Null will change

@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch from 32d2660 to fe6afd7 Compare October 23, 2022 20:39
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Poojita-Raj
Copy link
Contributor

@willyborankin Hey, would you like to address the gradle check failure and merge this PR?

@willyborankin
Copy link
Contributor Author

@willyborankin Hey, would you like to address the gradle check failure and merge this PR?

@Poojita-Raj I would like to. Im waiting for this one #3577 since it uses the same native libraries.

@wbeckler
Copy link

@willyborankin Have you ever benchmarked this to measure the actual impact on snapshotting and restore?

@stephen-crawford
Copy link
Contributor

HI @willyborankin, I know you are working on a lot of things but wanted to see what you needed to help move this forward. Let me know.

@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch from fe6afd7 to 1237624 Compare May 18, 2023 16:35
@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch from 54e90ae to 406865b Compare May 23, 2023 20:13
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented May 23, 2023

FYI, there is only one plugin (outside core) that implements repository for OCI [1], it may need minor adjustments (primarily, to use the settings centralized in BlobStoreRepository), otherwise the change should not break anything.
[1] https://github.com/opensearch-project/opensearch-oci-object-storage/

good point should I connect with them?

Open an issue in that repo so we don't forget.

@dblock dblock requested a review from nknize May 23, 2023 21:29
Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
@willyborankin willyborankin force-pushed the zstdsnapshoting-compression branch from 406865b to b8efda1 Compare June 1, 2023 09:40
@willyborankin willyborankin requested a review from dbwiddis as a code owner June 1, 2023 09:40
@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@dblock
Copy link
Member

dblock commented Jun 1, 2023

@nknize you good with this? needs you to dismiss your review

Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had a chance to revisit this, but it seems we have enough recent reviews and we can patch and/or revert hidden dragons. So I'll unblock.

@reta
Copy link
Collaborator

reta commented Jun 1, 2023

@nknize mind to resolve your comments as well (may be something pops up)? (the merge is still blocked)

@nknize
Copy link
Collaborator

nknize commented Jun 1, 2023

@nknize mind to resolve your comments as well (may be something pops up)? (the merge is still blocked)

I would love to. But apparently you can't resolve outdated (missing) conversations. And the alleged "workaround" doesn't work on mobile view. I'm happy to merge as admin to override these pearly github gates.

@nknize
Copy link
Collaborator

nknize commented Jun 1, 2023

Merging as admin due to "outdated" conversation resolution bug.

@nknize nknize merged commit 4df347c into opensearch-project:main Jun 1, 2023
@nknize
Copy link
Collaborator

nknize commented Jun 1, 2023

Thanks @willyborankin for your contribution and to everyone for your tenacity on this long running PR.

@reta reta added the backport 2.x Backport to 2.x branch label Jun 1, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2996-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4df347c6f904942b073ee4bc76ec87a095cee4c7
# Push it to GitHub
git push --set-upstream origin backport/backport-2996-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2996-to-2.x.

@reta
Copy link
Collaborator

reta commented Jun 1, 2023

@willyborankin mind sending manual backport against 2.x? thank you

@willyborankin
Copy link
Contributor Author

@willyborankin mind sending manual backport against 2.x? thank you

sure will do

willyborankin added a commit to willyborankin/OpenSearch that referenced this pull request Jun 4, 2023
Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
willyborankin added a commit to willyborankin/OpenSearch that referenced this pull request Jun 4, 2023
Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
reta pushed a commit that referenced this pull request Jun 5, 2023
Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this pull request Jun 26, 2023
…search-project#7906)

Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Changes:

- Added ZSTD compressor for snapshotting
- 2 JSON repository settings:
  - readonly
  - compression
were moved into the BlobStoreRepository class
and removed from other repos classes where they
were used.

Signed-off-by: Andrey Pleskach <ples@aiven.io>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants