Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion failure while closing remoteStore #10627

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Oct 16, 2023

Description

  • When an instance of Store is created, a shardlock is created which is released on closing the instance of store.
  • Currently, we create 2 instances of store for remote store backed indices: store and remoteStore.
  • As there can be only one shardlock acquired for a given shard, the lock is shared between store and remoteStore.
  • This creates an issue when we are deleting the index as it results in closing both store and remoteStore. At the time of closure of second Store instance, we get the assertion error saying shard is not locked.
  • Ideally, we should be closing the remoteStore but until we work on CompositeStore ([Remote Store] Create abstraction CompositeStore which will encapsulate local and remote store #3719), we mitigate the test failures by closing the remoteDirectory.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sachin Kale <kalsac@amazon.com>
@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change d4c2011

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

// Ideally, we should be closing the remoteStore but until we work on CompositeStore
// (https://github.com/opensearch-project/OpenSearch/issues/3719), we mitigate the test failures by
// closing the remoteDirectory.
indexShard.getRemoteDirectory().close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point which assertion fails without this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/env/NodeEnvironment.java#L579C15-L579C15

Stacktrace looks like this on test failure:

java.lang.AssertionError: shard [index][1] is not locked
at __randomizedtesting.SeedInfo.seed([892889442ABD4835]:0)
at org.opensearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:579)
at org.opensearch.indices.IndicesService.deleteShardStore(IndicesService.java:1194)
at org.opensearch.index.IndexService.onShardClose(IndexService.java:662)
at org.opensearch.index.IndexService$StoreCloseListener.accept(IndexService.java:785)
at org.opensearch.index.IndexService$StoreCloseListener.accept(IndexService.java:772)
at org.opensearch.index.store.Store.closeInternal(Store.java:550)
at org.opensearch.index.store.Store$1.closeInternal(Store.java:190)
at org.opensearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:78)
at org.opensearch.index.store.Store.decRef(Store.java:523)
at org.opensearch.index.engine.Engine$1.doClose(Engine.java:766)
at org.opensearch.index.engine.Engine$SearcherSupplier.close(Engine.java:1357)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:114)
at org.opensearch.common.lease.Releasables.close(Releasables.java:54)
at org.opensearch.common.lease.Releasables.close(Releasables.java:64)

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a test around that certainly fails if this change was not present.

@sachinpkale sachinpkale marked this pull request as ready for review October 27, 2023 05:07
@sachinpkale sachinpkale requested a review from abbashus as a code owner October 27, 2023 05:07
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 26, 2023
@dblock
Copy link
Member

dblock commented Nov 26, 2023

This still needs a test to get merged.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Nov 27, 2023
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 7, 2024
@ticheng-aws
Copy link
Contributor

Hi @sachinpkale, the PR is stalled. Is this being worked upon?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 8, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 11, 2024
@sohami
Copy link
Collaborator

sohami commented Feb 14, 2024

@sachinpkale Any update on adding tests around this change ?

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Feb 15, 2024
@sachinpkale
Copy link
Member Author

Closing this for now, will re-visit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

6 participants