Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve serialization speeds #2802

Merged

Conversation

parasjain1
Copy link
Contributor

@parasjain1 parasjain1 commented May 30, 2023

Description

Use custom serialization in security plugin.

Please refer #2780 for details.

Issues Resolved

Testing

  • Unit tests updated for Base64Helper
  • Test version upgrade scenario
  • Test version upgrade with DLS/FLS/Masked Fields with active search traffic

Check List

  • New functionality includes testing
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@parasjain1 parasjain1 force-pushed the parasjaz-custom-serialization branch 3 times, most recently from 107798b to add8f29 Compare June 2, 2023 18:09
@parasjain1 parasjain1 marked this pull request as ready for review June 8, 2023 09:45
Copy link
Member

@cwperks cwperks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @parasjain1! I took a first pass and left comments with a few questions that I have. Can you outline the steps you took to do backwards compatibility testing?

What does this look like during a rolling upgrade?

@parasjain1
Copy link
Contributor Author

Thank you @parasjain1! I took a first pass and left comments with a few questions that I have. Can you outline the steps you took to do backwards compatibility testing?

What does this look like during a rolling upgrade?

I tested this by creating a cluster with OS2.5, kept some traffic flowing via OSB, and then triggered an upgrade to OS2.7 (the code was modified to have a check on OS2.5 instead of OS2.7 for testing). The upgrade succeeded and the cluster continued to operate normally. This signifies that the old and the new nodes were able to communicate properly.

I've planned for a few additional rounds of testing especially with indices having DLS, FLS and Masked Field configurations.

@parasjain1 parasjain1 mentioned this pull request Jun 8, 2023
5 tasks
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, seems like there are some big wins to make in this space to improve performance.

I would like to push this as far as we can into a change that can be merged into the 2.x line, still have some comments on the source issue #2780

@cwperks
Copy link
Member

cwperks commented Jun 8, 2023

@parasjain1 Not sure if this would be helpful, but there was an issue with rolling upgrades from OS 1 to OS 2.x when there was a change in OS 2 to remove package replacement during serialization for the User object. From ODFE to OS 1 there was a change in the fully qualified package name of the user object from com.amazon.opendistroforelasticsearch.security.User -> org.opensearch.security.User. To keep nodes backwards compatible, OS 1 would replace the package name org.opensearch.security with com.amazon.opendistroforelasticsearch.security when serializing the User object but that logic was removed in OS 2.

In this PR, it then kept track of the cluster state to know the minimum version of a node in the cluster to choose which serialization method to use.

@parasjain1
Copy link
Contributor Author

parasjain1 commented Jun 8, 2023

@parasjain1 Not sure if this would be helpful, but there was an issue with rolling upgrades from OS 1 to OS 2.x when there was a change in OS 2 to remove package replacement during serialization for the User object. From ODFE to OS 1 there was a change in the fully qualified package name of the user object from com.amazon.opendistroforelasticsearch.security.User -> org.opensearch.security.User. To keep nodes backwards compatible, OS 1 would replace the package name org.opensearch.security with com.amazon.opendistroforelasticsearch.security when serializing the User object but that logic was removed in OS 2.

In this PR, it then kept track of the cluster state to know the minimum version of a node in the cluster to choose which serialization method to use.

Correct. I see this is where the minNodeVersion is being used -
https://github.com/opensearch-project/security/pull/2268/files#diff-8294f6476b483fd1ad05a2ec008c6de0923ac61de07d1f1d6ed6c5b5de34657dR143 - and as we see it depends on check whether there are < 1.0.0 nodes in the cluster. (I see this change may have been coming from ES to OS1.0, please correct me if I'm wrong.)

I just see this as another way of achieving the same result - i.e. to be able to decide what serialization protocol to use. Thoughts?

@parasjain1
Copy link
Contributor Author

Thought about it, we should be able to backport this as part of a new minor version in 2.x.x series.

Can we take this PR to closure and have a separate issue for backporting this change?

@peternied
Copy link
Member

Can we take this PR to closure and have a separate issue for backporting this change?

Lets keep iterating till we have a version of this change we feel confident can be backported. If this change is only targeting a 3.0 there is no time pressure, right?

@parasjain1
Copy link
Contributor Author

I was able to gather more understanding on the backporting and release process. Apologies for the confusion earlier. In below points, I'm clarifying and concluding the issue of backporting this PR -

  • Foremost, this change can be backported to 2.x and we intend to get it released with 2.9 which is due for code freeze on 07/11.
  • Backport tag can be added to this PR. Once this PR is merged to main, a backport PR for 2.x will be auto created but not merged. Once that PR is created, we can make a small change for the version check to be made for OS2.9 instead of OS3.0. This is a one off effort. The PR will have to be approved and manually merged.
  • I've been made aware that it's standard practice to only backport critical bugs or security issues to already released versions, and this is not one of those, hence we won't backport it to < OS2.9.
  • We can have a separate PR to remove the JDK serialization along with special handling for mixed cluster state completely in 3.0.0. (This will work because we always allow major version upgrades from tail minor versions of the prior major).

cc @peternied @backslasht

@parasjain1 parasjain1 force-pushed the parasjaz-custom-serialization branch from add8f29 to 63197a1 Compare June 14, 2023 05:22
@stephen-crawford
Copy link
Contributor

Hi @parasjain1, I am reaching out to see what the status of this PR is? It seems like you looked into backporting the changes but may have decided to wait to make a contribution? Are you still planning on driving this forward or should we close this for now?

@parasjain1 parasjain1 force-pushed the parasjaz-custom-serialization branch 3 times, most recently from 3fbd880 to 5352128 Compare June 26, 2023 08:37
Signed-off-by: Paras Jain <parasjaz@amazon.com>
@parasjain1 parasjain1 force-pushed the parasjaz-custom-serialization branch from 45081b5 to 1904949 Compare September 28, 2023 13:44
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some organization around the BWC test I've called out, I'll try to address them with pushes before the end of my day. After those are fixed I'll approve.

Great work with the BWC tests to ensure this code works.

@peternied peternied changed the title Base64Helper uses custom serialization Improve serialization speeds Sep 29, 2023
Copy link
Member

@cwperks cwperks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parasjain1 Thank you for all of the patience and hard work on this PR. This change looks great to me and I don't have any further comments.

If we could refer to core's version of jackson in the bwc tests that would be ideal, but I won't let that hold up this PR.

@peternied
Copy link
Member

I've restarted the CI on that failure and resolved the outstanding conversations after commenting.

Copy link
Member

@DarshitChanpura DarshitChanpura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @parasjain1. Especially around cleaning up BWC tests.

@DarshitChanpura DarshitChanpura merged commit 22f44d3 into opensearch-project:main Sep 29, 2023
58 checks passed
@DarshitChanpura DarshitChanpura added the backport 2.x backport to 2.x branch label Sep 29, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/security/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/security/backport-2.x
# Create a new branch
git switch --create backport/backport-2802-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 22f44d3bccf93a4af2562d26694094ec7151669f
# Push it to GitHub
git push --set-upstream origin backport/backport-2802-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/security/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2802-to-2.x.

parasjain1 pushed a commit to parasjain1/opensearch-security that referenced this pull request Sep 30, 2023
Use custom serialization in security plugin.
- Resolves opensearch-project#2780

Signed-off-by: Paras Jain <parasjaz@amazon.com>
Signed-off-by: Peter Nied <peternied@hotmail.com>
Co-authored-by: Paras Jain <parasjaz@amazon.com>
Co-authored-by: Peter Nied <peternied@hotmail.com>
Signed-off-by: Paras Jain <parasjaz@amazon.com>
@parasjain1 parasjain1 mentioned this pull request Oct 3, 2023
3 tasks
DarshitChanpura pushed a commit that referenced this pull request Oct 3, 2023
Signed-off-by: Paras Jain <parasjaz@amazon.com>
Signed-off-by: Peter Nied <peternied@hotmail.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Co-authored-by: Paras Jain <parasjaz@amazon.com>
Co-authored-by: Peter Nied <peternied@hotmail.com>
@gauravruhela gauravruhela added v2.10.0 Issues targeting release v2.10.0 v2.11.0 Issues targeting the 2.11 release and removed v2.11.0 Issues targeting the 2.11 release v2.10.0 Issues targeting release v2.10.0 labels Oct 4, 2023
peternied added a commit to peternied/security that referenced this pull request Apr 25, 2024
This change will prevent new clusters from using the 'custom serialization' format that was causing performance impact with customers in OpenSearch 2.11.

Background: the serialization changes from opensearch-project#2802 introduced issues where for certain serialization headers that were previously very small for over the wire transmission become much larger. The root cause of this was that the JDK serialization process was able to detect duplicate entries and then use an encoding format to make it compressible. Adding this logic into the serialization system from OpenSearch is non-trivial and is not being invested in.

Signed-off-by: Peter Nied <petern@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport to 2.x branch v2.11.0 Issues targeting the 2.11 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Performance] Use a faster serialization protocol within security plugin
7 participants