-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform successful Elasticsearch version check before migrations #51311
Perform successful Elasticsearch version check before migrations #51311
Conversation
Pinging @elastic/kibana-platform (Team:Platform) |
💔 Build Failed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw too late that this was only a draft! Kept only my NITs on current progress.
src/core/server/elasticsearch/version_check/ensure_es_version.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/elasticsearch/version_check/ensure_es_version.test.ts
Outdated
Show resolved
Hide resolved
src/core/server/elasticsearch/version_check/ensure_es_version.ts
Outdated
Show resolved
Hide resolved
src/core/server/elasticsearch/version_check/ensure_es_version.ts
Outdated
Show resolved
Hide resolved
@rudolf What's the status here? |
5c91f29
to
07829f8
Compare
07829f8
to
bd49618
Compare
bd49618
to
e2d6157
Compare
this.logger.debug( | ||
'Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations...' | ||
); | ||
await this.setupDeps!.elasticsearch.esNodesCompatibility$.pipe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour isn't 100% the same as what it was in legacy. In legacy we would start the status service so even though migrations wouldn't run, there would be a running server which showed that the Elasticsearch plugin was red with the reason which helps surface the underlying problem. Once we have a status service in NP we should aim to create similar behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#49785 states
Although the reasons for abandoning the health check still stand, we will have to keep polling to do the version check since new Elasticsearch nodes can join an existing cluster after Kibana has started up.
In current PR, we are only waiting for ES to be ready once to trigger some actions, but further state change are doing nothing. Do we know what we are planning to do in case of scenario like
- red -> green (trigger start of SO + legacy's
waitUntilReady
) -> red (atm do nothing) -> green (atm do nothing)
src/core/server/elasticsearch/version_check/ensure_es_version.ts
Outdated
Show resolved
Hide resolved
src/core/server/elasticsearch/version_check/ensure_es_version.ts
Outdated
Show resolved
Hide resolved
esNodesCompatibility$.subscribe(({ isCompatible, message, kibanaVersion, warningNodes }) => { | ||
if (!isCompatible) { | ||
esPlugin.status.red(message); | ||
} else { | ||
if (message && message.length > 0) { | ||
logWithMetadata(['warning'], message, { | ||
kibanaVersion, | ||
nodes: warningNodes, | ||
}); | ||
} | ||
esPlugin.status.green('Ready'); | ||
resolve(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we unsubscribe after the first resolve
call to avoid wrongly recalling resolve
in case of green->red->green ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to keep updating the status so we need the subscription. Although resolve()
should only be called once, calling it multiple times is a no-op so it won't cause any problems.
await this.setupDeps!.elasticsearch.esNodesCompatibility$.pipe( | ||
filter(nodes => nodes.isCompatible), | ||
take(1) | ||
).toPromise(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Except in the legacy code you adapted, we are not displaying any info message for the user about the fact that we are waiting (maybe indefinitely) for ES to be ready?
- Maybe we should add a timeout and throw a fatal after some time? Or are we expecting Kibana to hang indefinitely waiting for this condition?
- Should this check be done at a higher level (thinking in the
Server
)? It seems to me that waiting for ES to be ready is higher responsibility than the SOService should handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Except in the legacy code you adapted, we are not displaying any info message for the user about the fact that we are waiting (maybe indefinitely) for ES to be ready?
I've changed the log message to an info to indicate that we're waiting for ES and when we're starting migrations.
- Maybe we should add a timeout and throw a fatal after some time? Or are we expecting Kibana to hang indefinitely waiting for this condition?
The existing behaviour is to wait indefinitely. It could take a day before a faulty cluster is fixed, in such a case I think it's nice if Kibana just starts working again automatically.
- Should this check be done at a higher level (thinking in the Server)? It seems to me that waiting for ES to be ready is higher responsibility than the SOService should handle.
I don't have a strong opinion, but I think if the SO Service has some dependency on an external condition then the logic to wait for that condition belongs in the SO Service. This is a minor, but when it comes to the logging tags it might make it easier to see that these logs are related if they all have the same tags, rather than some being tagged server
and others savedobjects-service
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it might be good to repeat this message on an interval but I wouldn't consider that a blocker to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, but I think if the SO Service has some dependency on an external condition then the logic to wait for that condition belongs in the SO Service. This is a minor, but when it comes to the logging tags it might make it easier to see that these logs are related if they all have the same tags, rather than some being tagged server and others savedobjects-service.
I think it's fine we put this in SO service until/if there are other Core services that require this as well.
After further testing I realised there were two incorrect behaviours:
|
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -29,7 +29,7 @@ let mockDefaultRouteSetting: any = ''; | |||
describe('default route provider', () => { | |||
let root: Root; | |||
beforeAll(async () => { | |||
root = kbnTestServer.createRoot(); | |||
root = kbnTestServer.createRoot({ migrations: { skip: true } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: It seems you adapted every call to createRoot
to add this. Should we set migrations: { skip: true }
as a default in kbnTestServer.createRoot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests run with esArchiver and then they need to apply migrations. Ideally we shouldn't disable migrations, but we should disable the ES version check itself. There is elasticsearch.ignoreVersionMismatch
but it's only available in development and our integration tests run against production. We could just make this option available in production, but I think it warrants a bigger discussion so I created #56505
…stic#51311) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion
@rudolf I have backported to 7.x, but 7.6 is a bit more challenging as the tests rely on |
) (#56600) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion
) (#56629) * Convert parts of Elasticsearch version check to ts * Move ES version check to NP * Improve types * Wait till for compatible ES nodes before SO migrations * Don't wait for ES compatibility if skipMigrations=true * Legacy Elasticsearch plugin integration test * Make ES compatibility check and migrations logging more visible * Test for isCompatible=false when ES version check throws * Start pollEsNodesVersion immediately * Refactor pollEsNodesVersion
Summary
Fixes #49785 #14480
Testing notes:
When starting Kibana against a cluster with an unsupported ES node, Kibana should log:
but should not start saved object migrations until all the nodes are compatible
If Kibana has successfully started with a compatible ES cluster, but then an incompatible ES node joins the cluster, Kibana's "elasticsearch" plugin should go into a red state and reloading Kibana in a browser should render the status page.
Release note:
This fix addresses a regression where Kibana would not check that all Elasticsearch nodes are compatible before starting Saved Object migrations.
Checklist
Use
strikethroughsto remove checklist items you don't feel are applicable to this PR.For maintainers