Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

Closed
stu-elastic opened this issue Mar 10, 2022 · 10 comments · Fixed by #84994
Closed

[CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' #84869

stu-elastic opened this issue Mar 10, 2022 · 10 comments · Fixed by #84994
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. needs:triage Requires assignment of a team area label Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@stu-elastic
Copy link
Contributor

stu-elastic commented Mar 10, 2022

CI Link

https://gradle-enterprise.elastic.co/s/j4ilhrcoxxn64

Repro line

./gradlew ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest'

Does it reproduce?

Yes

Applicable branches

master

Failure history

https://gradle-enterprise.elastic.co/scans/failures?failures.failureClassification=all_failures&failures.failureMessage=Execution%20failed%20for%20task%20*%0A%3E%20process%20was%20found%20dead%20while%20waiting%20for%20cluster%20health%20yellow%2C%20*&search.relativeStartTime=P7D&search.timeZoneId=America/Chicago

Failure excerpt


» [2022-03-10T15:01:58,728][WARN ][o.e.t.RemoteClusterConnection] [v7.5.0-0] fetching nodes from external cluster [foo] failed
»  org.elasticsearch.transport.ConnectTransportException: [][127.0.0.1:9200] connect_exception
»  	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:989) ~[elasticsearch-7.5.0.jar:7.5.0]
»  	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$3(ActionListener.java:162) ~[elasticsearch-7.5.0.jar:7.5.0]
...
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]
»  Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 127.0.0.1/127.0.0.1:45187
»  Caused by: java.net.ConnectException: Connection refused
»  	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
»  	at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
»  	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
»  	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
»  	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) ~[?:?]
»  	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
»  	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
»  	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»  	... 1 more
@stu-elastic stu-elastic added >test-failure Triaged test failures from CI needs:triage Requires assignment of a team area label labels Mar 10, 2022
@stu-elastic stu-elastic changed the title CI Failure: ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' [CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' MlHiddenIndicesFullClusterRestartIT Mar 10, 2022
@stu-elastic stu-elastic changed the title [CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' MlHiddenIndicesFullClusterRestartIT [CI] ':x-pack:qa:full-cluster-restart:v7.5.0#upgradedClusterTest' Mar 10, 2022
@stu-elastic
Copy link
Contributor Author

@mark-vieira
Copy link
Contributor

The actual problem here looks to be an assertion error causing the node to exit.

» [2022-03-10T15:04:24,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-0] fatal error in thread [elasticsearch[v7.5.0-0][clusterApplierService#updateTask][T#1]], exiting
»  java.lang.AssertionError: null
»  	at org.elasticsearch.index.IndexService.updateMetadata(IndexService.java:787) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:535) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:228) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
»  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
»  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
»  	at java.lang.Thread.run(Thread.java:833) [?:?]

@droberts195 droberts195 added the :Core/Infra/Core Core issues without another label label Mar 10, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Mar 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@droberts195
Copy link
Contributor

The reason I pinged core/infra is that based on a quick look I had in #84844 (comment) I think this could be a follow on problem after #84780 was merged. Apologies if I'm wrong and it's a more generic index metadata problem.

@stu-elastic
Copy link
Contributor Author

[2022-03-10T15:04:24,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v7.5.0-0] fatal error in thread [elasticsearch[v7.5.0-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
        at org.elasticsearch.index.IndexService.updateMetadata(IndexService.java:787) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.updateIndices(IndicesClusterStateService.java:535) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:228) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:428) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:154) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:717) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223) ~[elasticsearch-8.1.1-SNAPSHOT.jar:8.1.1-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]

IndexService.java

        if (Assertions.ENABLED && currentIndexMetadata != null) {
            final long currentSettingsVersion = currentIndexMetadata.getSettingsVersion();
            final long newSettingsVersion = newIndexMetadata.getSettingsVersion();
            if (currentSettingsVersion == newSettingsVersion) {
                assert updateIndexSettings == false;
            } else {

@stu-elastic
Copy link
Contributor Author

Looks like update settings is not matching.

@stu-elastic
Copy link
Contributor Author

I'm going to move this over to distributed based on the settings versions not matching.

@stu-elastic stu-elastic added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Mar 10, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Mar 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@idegtiarenko
Copy link
Contributor

I have added message with some details to the failing assertion:

java.lang.AssertionError: 

index=[.watches/kKfyFRCURt69dIoKl9S0Lw]
currentSettingsVersion=4
newSettingsVersion=4
updateIndexSettings=true
currentIndexMetadataSettings=index.auto_expand_replicas=0-1,index.creation_date=1647346793805,index.format=6,                  index.number_of_replicas=1,index.number_of_shards=1,index.priority=800,index.provided_name=.watches,index.uuid=kKfyFRCURt69dIoKl9S0Lw,index.version.created=7050299,
newIndexMetadataSettings    =index.auto_expand_replicas=0-1,index.creation_date=1647346793805,index.format=6,index.hidden=true,index.number_of_replicas=1,index.number_of_shards=1,index.priority=800,index.provided_name=.watches,index.uuid=kKfyFRCURt69dIoKl9S0Lw,index.version.created=7050299,

New settings have index.hidden=true that was not there previously.

@idegtiarenko
Copy link
Contributor

idegtiarenko commented Mar 15, 2022

My current assumption is that this setting is added by

if (isSystem && isHidden == false) {
builder.settings(Settings.builder().put(indexMetadata.getSettings()).put(IndexMetadata.SETTING_INDEX_HIDDEN, true));
updated = true;
}

however settings version is not incremented (I am not sure why) and this result in assertion failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. needs:triage Requires assignment of a team area label Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants