CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

ywelsch · 2015-12-08T13:18:44Z

http://build-us-00.elastic.co/job/es_core_master_centos/8910/testReport/junit/org.elasticsearch.action.admin.indices.create/CreateIndexIT/testCreateAndDeleteIndexConcurrently/

Stacktrace:

UnavailableShardsException[[test][1] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: index {[test][test][AVGADS6IhWB4gLTwZw1z], source[{"index_version":1}]}]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.retryBecauseUnavailable(TransportReplicationAction.java:665)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:381)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$3.onTimeout(TransportReplicationAction.java:522)
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:628)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Interesting log lines:

[2015-12-08 05:25:16,274][WARN ][org.elasticsearch.indices.cluster] [node_t0] [[test][1]] marking and sending shard failed due to [failed to create shard]
org.apache.lucene.store.LockObtainFailedException: Can't lock shard [test][1], timed out after 5000ms
    at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:556)
    at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:485)
    at org.elasticsearch.index.IndexService.createShard(IndexService.java:236)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:591)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:491)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:185)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:526)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:596)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

...

[2015-12-08 05:25:26,295][DEBUG][org.elasticsearch.cluster.routing.allocation] [node_t0] [test][0] ignoring shard failure, unknown allocation id in [test][0], node[w7ZOpTaqRi6UI-q9EjfG2Q], [P], v[1], s[INITIALIZING], a[id=WmeldM5ORfuNr8rWhbHvYQ], unassigned_info[[reason=INDEX_CREATED], at[2015-12-08T05:25:11.268Z]] ([reason=ALLOCATION_FAILED], at[2015-12-08T05:25:26.295Z], details[master {node_t0}{w7ZOpTaqRi6UI-q9EjfG2Q}{local}{local[182]}[mode=>local] marked shard as initializing, but shard is marked as failed, resend shard failure])

...

[2015-12-08 05:25:26,303][DEBUG][org.elasticsearch.gateway] [node_t0] [test][0] found 0 allocations of [test][0], node[null], [P], v[2], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-12-08T05:25:26.294Z], details[failed to create shard, failure LockObtainFailedException[Can't lock shard [test][0], timed out after 5000ms]]], highest version: [-1]

Analysis:

What's happening is that index creation fails as locks cannot be obtained for allocating shards (I'm not sure why that is). Shards are marked as failed: This changes UnassignedInfo of the ShardRouting objects by setting reason from INDEX_CREATED to ALLOCATION_FAILED. What this entails is that the PrimaryShardAllocator does not treat these unassigned shards as "new index creation" anymore, but requires that shard data is available before assigning them. Relates to #15241.

The text was updated successfully, but these errors were encountered:

bleskes · 2015-12-08T14:02:12Z

What's happening is that index creation fails as locks cannot be obtained for allocating shards (I'm not sure why that is).

That's a good question and probably the source of this trouble

What this entails is that the PrimaryShardAllocator does not treat these unassigned shards as "new index creation" anymore, but requires that shard data is available before assigning them.

I think this will be fixed by #15281 or a follow up where we wouldn't need the allocatedPostIndexCreate() flag for primary allocations as the activeAllocaitonId will tell us.

bleskes · 2015-12-08T14:02:22Z

and good analysis :)

ywelsch · 2015-12-16T09:52:23Z

and once more:

http://build-us-00.elastic.co/job/es_core_master_windows-2012-r2/2196/testReport/junit/org.elasticsearch.action.admin.indices.create/CreateIndexIT/testCreateAndDeleteIndexConcurrently/

The test fails in different ways since we can hit timeouts and shards not being available on retry which we simply didn't expect before but are valid. Closes elastic#15312 Closes elastic#14512

bleskes · 2016-01-08T13:19:30Z

Closing as a duplicate of #14932 , w.r.t shard locking issue. The primary allocation part has been fixed with #15281

ywelsch added >test Issues or PRs that are addressing/adding tests >test-failure Triaged test failures from CI labels Dec 8, 2015

s1monw mentioned this issue Jan 8, 2016

Fix CreateIndexIT#testCreateAndDeleteIndexConcurrently #15853

Closed

bleskes closed this as completed Jan 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

ywelsch commented Dec 8, 2015

bleskes commented Dec 8, 2015

bleskes commented Dec 8, 2015

ywelsch commented Dec 16, 2015

bleskes commented Jan 8, 2016

CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

CreateIndexIT.testCreateAndDeleteIndexConcurrently fails #15312

Comments

ywelsch commented Dec 8, 2015

bleskes commented Dec 8, 2015

bleskes commented Dec 8, 2015

ywelsch commented Dec 16, 2015

bleskes commented Jan 8, 2016