Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to prevent segment fault errors arising due to unexpected SDK b… #11334

Merged

Conversation

vikasvb90
Copy link
Contributor

@vikasvb90 vikasvb90 commented Nov 26, 2023

Description

This PR adds guards against unexpected S3 SDK errors.

  1. Guards against access to stream after callback from SDK. Cases were observed in which SDK was trying to access stream even after callback was invoked by it. This was leading to segment fault errors in segment upload path as MMap files were being removed from memory on close and attempts to read invalid references were happening by SDK. Also, JDK 17 is not handing these errors gracefully which was leading to JVM crash.
  2. Multiple invocations to close was another cause of segment fault errors.

This is a forked PR from : #11327

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 78cfad7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the segment_fault_fixes_for_s3_sdk branch from 78cfad7 to e8ef33c Compare November 27, 2023 13:05
Copy link
Contributor

❕ Gradle check result for e8ef33c: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testRelocateWhileContinuouslyIndexingAndWaitingForRefresh

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@vikasvb90 vikasvb90 force-pushed the segment_fault_fixes_for_s3_sdk branch 3 times, most recently from d3d01bb to c671791 Compare November 28, 2023 03:17
Copy link
Contributor

❌ Gradle check result for d3d01bb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for e2c5146: SUCCESS

Copy link

codecov bot commented Nov 28, 2023

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (effc9bc) 71.46% compared to head (1d91527) 71.42%.
Report is 2 commits behind head on main.

Files Patch % Lines
...n/annotation/processor/ApiAnnotationProcessor.java 88.18% 1 Missing and 14 partials ⚠️
...e/transfer/stream/OffsetRangeIndexInputStream.java 74.35% 9 Missing and 1 partial ⚠️
...on/blobstore/transfer/RemoteTransferContainer.java 71.42% 5 Missing and 1 partial ⚠️
...bstore/transfer/stream/OffsetRangeInputStream.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11334      +/-   ##
============================================
- Coverage     71.46%   71.42%   -0.05%     
+ Complexity    59176    59170       -6     
============================================
  Files          4903     4904       +1     
  Lines        277987   278147     +160     
  Branches      40382    40419      +37     
============================================
- Hits         198662   198657       -5     
- Misses        62805    62971     +166     
+ Partials      16520    16519       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for c671791: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Dec 4, 2023

❌ Gradle check result for 02637aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the segment_fault_fixes_for_s3_sdk branch from 02637aa to 29b8222 Compare December 4, 2023 19:15
Copy link
Contributor

github-actions bot commented Dec 4, 2023

❌ Gradle check result for 29b8222: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the segment_fault_fixes_for_s3_sdk branch from 29b8222 to d2713b4 Compare December 6, 2023 17:00
Copy link
Contributor

github-actions bot commented Dec 6, 2023

❌ Gradle check result for d2713b4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vikasvb90. Please rebase this against the latest in main to fix the bwc test failure.

…ehaviour

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
…remove stream ownership from here

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
@vikasvb90 vikasvb90 force-pushed the segment_fault_fixes_for_s3_sdk branch from d2713b4 to 1d91527 Compare December 7, 2023 04:33
Copy link
Contributor

github-actions bot commented Dec 7, 2023

❕ Gradle check result for 1d91527: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@gbbafna gbbafna merged commit 2a4aafd into opensearch-project:main Dec 7, 2023
29 checks passed
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Dec 7, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 7, 2023
#11334)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
(cherry picked from commit 2a4aafd)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Dec 7, 2023
#11334) (#11509)

(cherry picked from commit 2a4aafd)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Dec 11, 2023
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
opensearch-project#11334)

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants