[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

stu-elastic · 2022-03-10T15:19:33Z

CI Link

https://gradle-enterprise.elastic.co/s/j4ilhrcoxxn64

Repro line

./gradlew ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest'

Does it reproduce?

Yes

Applicable branches

master

Failure history

https://gradle-enterprise.elastic.co/scans/failures?failures.failureClassification=all_failures&failures.failureMessage=Execution%20failed%20for%20task%20*%0A%3E%20process%20was%20found%20dead%20while%20waiting%20for%20cluster%20health%20yellow%2C%20*&search.relativeStartTime=P7D&search.timeZoneId=America/Chicago

Failure excerpt


» [2022-03-10T15:01:58,728][WARN ][o.e.t.RemoteClusterConnection] [v7.5.0-0] fetching nodes from external cluster [foo] failed
»  org.elasticsearch.transport.ConnectTransportException: [][127.0.0.1:9200] connect_exception
»  	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:989) ~[elasticsearch-7.5.0.jar:7.5.0]
»  	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:162) ~[elasticsearch-7.5.0.jar:7.5.0]
...
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]
»  Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:45187
»  Caused by: java.net.ConnectException: Connection refused
»  	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
»  	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
»  	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
»  	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
»  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
»  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»  	... 1 more

The text was updated successfully, but these errors were encountered:

stu-elastic · 2022-03-10T16:49:28Z

Another: https://gradle-enterprise.elastic.co/s/ib7biqvjfqney

mark-vieira · 2022-03-10T17:22:56Z

The actual problem here looks to be an assertion error causing the node to exit.

» [2022-03-10T15:04:24,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-0] fatal error in thread [elasticsearch[v7.5.0-0][clusterApplierService#updateTask][T#1]], exiting
»  java.lang.AssertionError: null
»  	at org.elasticsearch.index.IndexService.updateMetadata(IndexService.java:787) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:535) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:228) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]

elasticmachine · 2022-03-10T17:38:11Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

droberts195 · 2022-03-10T17:41:19Z

The reason I pinged core/infra is that based on a quick look I had in #84844 (comment) I think this could be a follow on problem after #84780 was merged. Apologies if I'm wrong and it's a more generic index metadata problem.

stu-elastic · 2022-03-10T17:55:29Z

[2022-03-10T15:04:24,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-0] fatal error in thread [elasticsearch[v7.5.0-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
        at org.elasticsearch.index.IndexService.updateMetadata(IndexService.java:787) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:535) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:228) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]

IndexService.java

        if (Assertions.ENABLED && currentIndexMetadata != null) {
            final long currentSettingsVersion = currentIndexMetadata.getSettingsVersion();
            final long newSettingsVersion = newIndexMetadata.getSettingsVersion();
            if (currentSettingsVersion == newSettingsVersion) {
                assert updateIndexSettings == false;
            } else {

stu-elastic · 2022-03-10T17:55:48Z

Looks like update settings is not matching.

stu-elastic · 2022-03-10T18:02:40Z

I'm going to move this over to distributed based on the settings versions not matching.

elasticmachine · 2022-03-10T18:03:17Z

Pinging @elastic/es-distributed (Team:Distributed)

idegtiarenko · 2022-03-15T12:30:34Z

I have added message with some details to the failing assertion:

java.lang.AssertionError: 

index=[.watches/kKfyFRCURt69dIoKl9S0Lw]
currentSettingsVersion=4
newSettingsVersion=4
updateIndexSettings=true
currentIndexMetadataSettings=index.auto_expand_replicas=0-1,index.creation_date=1647346793805,index.format=6,                  index.number_of_replicas=1,index.number_of_shards=1,index.priority=800,index.provided_name=.watches,index.uuid=kKfyFRCURt69dIoKl9S0Lw,index.version.created=7050299,
newIndexMetadataSettings    =index.auto_expand_replicas=0-1,index.creation_date=1647346793805,index.format=6,index.hidden=true,index.number_of_replicas=1,index.number_of_shards=1,index.priority=800,index.provided_name=.watches,index.uuid=kKfyFRCURt69dIoKl9S0Lw,index.version.created=7050299,

New settings have index.hidden=true that was not there previously.

idegtiarenko · 2022-03-15T14:31:48Z

My current assumption is that this setting is added by

elasticsearch/server/src/main/java/org/elasticsearch/cluster/metadata/SystemIndexMetadataUpgradeService.java

Lines 105 to 108 in 87bd2eb

    
           if (isSystem && isHidden == false) { 
        
               builder.settings(Settings.builder().put(indexMetadata.getSettings()).put(IndexMetadata.SETTING_INDEX_HIDDEN, true)); 
        
               updated = true; 
        
           }

however settings version is not incremented (I am not sure why) and this result in assertion failure.

stu-elastic added >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label labels Mar 10, 2022

stu-elastic changed the title ~~CI Failure: ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest'~~ [CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' MlHiddenIndicesFullClusterRestartIT Mar 10, 2022

stu-elastic changed the title ~~[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' MlHiddenIndicesFullClusterRestartIT~~ [CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' Mar 10, 2022

droberts195 mentioned this issue Mar 10, 2022

[CI] :x-pack:qa:full-cluster-restart:v7.6.0#upgradedClusterTest MlHiddenIndicesFullClusterRestartIT #84870

Closed

droberts195 mentioned this issue Mar 10, 2022

ML fails full cluster restart from very old clusters on Arch Linux #84844

Closed

droberts195 added the :Core/Infra/Core Core issues without another label label Mar 10, 2022

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Mar 10, 2022

stu-elastic added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Mar 10, 2022

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 10, 2022

nik9000 mentioned this issue Mar 14, 2022

REST tests for variable width historam #84836

Merged

idegtiarenko self-assigned this Mar 15, 2022

idegtiarenko mentioned this issue Mar 15, 2022

Increment version on system index settings change #84994

Merged

idegtiarenko closed this as completed in #84994 Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

stu-elastic commented Mar 10, 2022 •

edited

Loading

stu-elastic commented Mar 10, 2022

mark-vieira commented Mar 10, 2022

elasticmachine commented Mar 10, 2022

droberts195 commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

elasticmachine commented Mar 10, 2022

idegtiarenko commented Mar 15, 2022

idegtiarenko commented Mar 15, 2022 •

edited

Loading

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

Comments

stu-elastic commented Mar 10, 2022 • edited Loading

CI Link

Repro line

Does it reproduce?

Applicable branches

Failure history

Failure excerpt

stu-elastic commented Mar 10, 2022

mark-vieira commented Mar 10, 2022

elasticmachine commented Mar 10, 2022

droberts195 commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

stu-elastic commented Mar 10, 2022

elasticmachine commented Mar 10, 2022

idegtiarenko commented Mar 15, 2022

idegtiarenko commented Mar 15, 2022 • edited Loading

stu-elastic commented Mar 10, 2022 •

edited

Loading

idegtiarenko commented Mar 15, 2022 •

edited

Loading