Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Remote Store] RemoteStoreRefreshListenerTests.testAfterCommit flaky test failure #8947

Closed
dreamer-89 opened this issue Jul 28, 2023 · 4 comments
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage:Remote Storage Issues and PRs relating to data and metadata storage :test Adding or fixing a test >test-failure Test failure from CI, local build, etc. v2.11.0 Issues and PRs related to version 2.11.0

Comments

@dreamer-89
Copy link
Member

Coming from meta issue tracking Segment Replication flaky test failures

3 org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testAfterCommit (19629,19844,19937)
@dreamer-89 dreamer-89 added bug Something isn't working :test Adding or fixing a test >test-failure Test failure from CI, local build, etc. distributed framework flaky-test Random test failure that succeeds on second run v2.10.0 Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage labels Jul 28, 2023
@sachinpkale sachinpkale self-assigned this Aug 3, 2023
@anasalkouz anasalkouz moved this from Todo to In Progress in Segment Replication Aug 10, 2023
@sachinpkale
Copy link
Member

Not able to reproduce in local (ran successfully for 200+ times). Resolving.

@mch2
Copy link
Member

mch2 commented Oct 12, 2023

@ashking94
Copy link
Member

When the test is run in isolation, it did not fail even after 1K iterations. But when this runs as part of the entire class, seeing failures with below exception -

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testAfterCommit" -Dtests.seed=81C8CA41597DE51F -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=uk -Dtests.timezone=America/Belize -Druntime.java=21

org.opensearch.index.shard.RemoteStoreRefreshListenerTests > testAfterCommit FAILED
    java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 1 open files: {_0.cfs=1}
        at __randomizedtesting.SeedInfo.seed([81C8CA41597DE51F:CCBB4D0438019FC1]:0)
        at org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:876)
        at org.apache.lucene.store.FilterDirectory.close(FilterDirectory.java:111)
        at org.apache.lucene.store.FilterDirectory.close(FilterDirectory.java:111)
        at org.opensearch.index.store.Store$StoreDirectory.innerClose(Store.java:952)
        at org.opensearch.index.store.Store.closeInternal(Store.java:571)
        at org.opensearch.index.store.Store$1.closeInternal(Store.java:194)
        at org.opensearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:78)
        at org.opensearch.index.store.Store.decRef(Store.java:546)
        at org.opensearch.index.store.Store.close(Store.java:553)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
        at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
        at org.opensearch.index.shard.IndexShardTestCase.closeShard(IndexShardTestCase.java:965)
        at org.opensearch.index.shard.IndexShardTestCase.closeShards(IndexShardTestCase.java:972)
        at org.opensearch.index.shard.IndexShardTestCase.closeShards(IndexShardTestCase.java:952)
        at org.opensearch.index.shard.RemoteStoreRefreshListenerTests.tearDown(RemoteStoreRefreshListenerTests.java:105)

        Caused by:
        java.lang.RuntimeException: unclosed IndexInput: _0.cfs
            at org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783)
            at org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:835)
            at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
            at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
            at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.<init>(Lucene90CompoundReader.java:78)
            at org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat.getCompoundReader(Lucene90CompoundFormat.java:87)
            at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:104)
            at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:96)
            at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:178)
            at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:220)
            at org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:542)
            at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:138)
            at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:604)
            at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381)
            at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355)
            at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345)
            at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)
            at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
            at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:72)
            at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:52)
            at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)
            at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
            at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:433)
            at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:413)
            at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)
            at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240)
            at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1771)
            at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1748)
            at org.opensearch.index.shard.IndexShard.refresh(IndexShard.java:1341)
            at org.opensearch.index.shard.RemoteStoreRefreshListenerTests.setup(RemoteStoreRefreshListenerTests.java:81)
            at org.opensearch.index.shard.RemoteStoreRefreshListenerTests.testAfterCommit(RemoteStoreRefreshListenerTests.java:196)```

@github-project-automation github-project-automation bot moved this from In Progress to Done in Segment Replication Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage:Remote Storage Issues and PRs relating to data and metadata storage :test Adding or fixing a test >test-failure Test failure from CI, local build, etc. v2.11.0 Issues and PRs related to version 2.11.0
Projects
Status: Done
Development

No branches or pull requests

5 participants