Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky #5325

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

reta
Copy link
Collaborator

@reta reta commented Nov 21, 2022

Signed-off-by: Andriy Redko andriy.redko@aiven.io

Description

Fixing org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test. Some background on possible causes of such a failure (I was not able to reproduce them locally yet). The 20_repository_permanent_credentials is using Minio under the hood as S3 replacement. There are 3 basic failure patterns:

RepositoryException[[repository_permanent] could not read repository data from index blob]; nested: NoSuchFileException[Blob object [base_integration_tests/index-30894] not found: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: P15KX1DMD577FQ5Y; S3 Extended Request ID: XEF7MRXUHU4E4v9cl8TQVPTb6VOy7PYoWCEQRXzvarSYg88VulVCiMvZZ1pxSbgDD5+ZD2sl28E=; Proxy: null)];
	at org.opensearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:1842)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.safeRepositoryData(BlobStoreRepository.java:798)
	at org.opensearch.repositories.blobstore.BlobStoreRepository$2.doRun(BlobStoreRepository.java:731)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
SnapshotException[[repository_permanent:snapshot-one/tBb-g3U1QsGROobhOKrjGw] failed to update snapshot in repository]; nested: IllegalStateException[Duplicate key docs (attempted merging values [docs/JdXo8HClSeu2GEuqJUgpMQ] and [docs/dB3pskUbQ3S3rsTds_INgw])];
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$39(BlobStoreRepository.java:1395)
	at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88)
	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:82)
	at org.opensearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:81)
	at org.opensearch.action.ActionRunnable$1.doRun(ActionRunnable.java:61)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
RepositoryException[[repository_permanent] concurrent modification of the index-N file, expected current generation [30818] but it was not found in the repository]
	at org.opensearch.repositories.blobstore.BlobStoreRepository.ensureSafeGenerationExists(BlobStoreRepository.java:2205)
	at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$writeIndexGen$57(BlobStoreRepository.java:2086)
	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80)
	at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)

There is an hypothesis that tests fail sometimes due to Minio's consistency limitations (https://min.io/docs/minio/container/operations/install-deploy-manage/deploy-minio-single-node-single-drive.html#deploy-single-node-single-drive-minio)

MinIO’s strict read-after-write and list-after-write consistency model requires local drive filesystems (xfs, ext4, etc.).

Issues Resolved

Closes #5219

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2022

Codecov Report

Merging #5325 (ecbd52d) into main (66c5448) will increase coverage by 0.92%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##               main    #5325      +/-   ##
============================================
+ Coverage     70.93%   71.85%   +0.92%     
- Complexity    58109    58756     +647     
============================================
  Files          4704     4704              
  Lines        277265   277265              
  Branches      40148    40148              
============================================
+ Hits         196670   199229    +2559     
+ Misses        64485    62302    -2183     
+ Partials      16110    15734     -376     
Impacted Files Coverage Δ
.../index/shard/IndexShardNotRecoveringException.java 0.00% <0.00%> (-50.00%) ⬇️
...ndex/seqno/RetentionLeaseBackgroundSyncAction.java 37.50% <0.00%> (-37.50%) ⬇️
.../java/org/opensearch/client/ResponseException.java 76.19% <0.00%> (-14.29%) ⬇️
...nsearch/index/fieldvisitor/IdOnlyFieldVisitor.java 76.47% <0.00%> (-11.77%) ⬇️
...a/org/opensearch/action/search/ParsedScrollId.java 77.77% <0.00%> (-11.12%) ⬇️
.../node/tasks/cancel/TransportCancelTasksAction.java 83.33% <0.00%> (-8.34%) ⬇️
...pensearch/search/internal/LegacyReaderContext.java 81.81% <0.00%> (-3.04%) ⬇️
.../java/org/opensearch/gateway/GatewayMetaState.java 72.24% <0.00%> (-2.86%) ⬇️
...org/opensearch/common/util/CancellableThreads.java 75.71% <0.00%> (-2.86%) ⬇️
.../src/main/java/org/opensearch/client/Response.java 92.50% <0.00%> (-2.50%) ⬇️
... and 485 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

It's a good hunch. I think we should merge this, close the issue with comment, and reopen if we see it again.

…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta added backport 1.x backport 2.x Backport to 2.x branch backport 1.3 Backport to 1.3 branch backport 2.4 Backport to 2.4 branch and removed backport 1.x backport 2.x Backport to 2.x branch backport 1.3 Backport to 1.3 branch backport 2.4 Backport to 2.4 branch labels Nov 21, 2022
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.index.ShardIndexingPressureConcurrentExecutionTests.testReplicaThreadedUpdateToShardLimitsAndRejections

@reta reta marked this pull request as ready for review November 22, 2022 01:19
@reta reta requested a review from a team as a code owner November 22, 2022 01:19
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@reta reta merged commit a1ceff4 into opensearch-project:main Nov 22, 2022
@reta reta added backport 1.x backport 2.x Backport to 2.x branch backport 1.3 Backport to 1.3 branch backport 2.4 Backport to 2.4 branch labels Nov 22, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit a1ceff4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-1.x 1.x
# Navigate to the new working tree
pushd ../.worktrees/backport-1.x
# Create a new branch
git switch --create backport/backport-5325-to-1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a1ceff41b9ae9508e68751d98999ddc843daf5d1
# Push it to GitHub
git push --set-upstream origin backport/backport-5325-to-1.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-1.x

Then, create a pull request where the base branch is 1.x and the compare/head branch is backport/backport-5325-to-1.x.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit a1ceff4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link
Contributor

The backport to 1.3 failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-1.3 1.3
# Navigate to the new working tree
pushd ../.worktrees/backport-1.3
# Create a new branch
git switch --create backport/backport-5325-to-1.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a1ceff41b9ae9508e68751d98999ddc843daf5d1
# Push it to GitHub
git push --set-upstream origin backport/backport-5325-to-1.3
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-1.3

Then, create a pull request where the base branch is 1.3 and the compare/head branch is backport/backport-5325-to-1.3.

andrross pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325) (#5335)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit a1ceff4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
reta added a commit to reta/OpenSearch that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (opensearch-project#5325)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
reta added a commit to reta/OpenSearch that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (opensearch-project#5325)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
dblock pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325) (#5336)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
(cherry picked from commit a1ceff4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dblock pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325) (#5339)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
dblock pushed a commit that referenced this pull request Nov 22, 2022
…T/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky (#5325) (#5338)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.x backport 1.3 Backport to 1.3 branch backport 2.x Backport to 2.x branch backport 2.4 Backport to 2.4 branch skip-changelog
Projects
None yet
3 participants