-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing the concept of ES capabilities #164850
Introducing the concept of ES capabilities #164850
Conversation
dd441b1
to
ee1f3b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review
capabilities = getElasticsearchCapabilities({ | ||
clusterInfo: await firstValueFrom(this.clusterInfo$!), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using the clusterInfo$
observable to retrieve the info required to build the capabilities, which avoids to perform an additional call to ES.
} | ||
|
||
return { | ||
client: this.client!, | ||
createClient: (type, clientConfig) => this.createClusterClient(type, config, clientConfig), | ||
getCapabilities: () => capabilities, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, technically, capabilities are bound to a given cluster, so a given client. However attaching the API to ClusterClient
was way more complex, and made the caching way harder too.
I doubt we're connecting Kibana to multiple clusters, especially mixed serverless/non-serverless ones anytime soon, so I just KISS.
* | ||
* @internal | ||
*/ | ||
export const getCapabilitiesFromClient = async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is exposed only for integration tests (at least for now)
@@ -28,6 +29,7 @@ export function getClusterInfo$(internalClient: ElasticsearchClient): Observable | |||
cluster_name: info.cluster_name, | |||
cluster_uuid: info.cluster_uuid, | |||
cluster_version: info.version.number, | |||
cluster_build_flavor: info.version.build_flavor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said previously, I reused the client.info
call from this method to build the capabilities, which surfaced the cluster_build_flavor
in a few files / snapshots. Overall, I don't think this is a problem so I didn't bother filtering / excluding it in some places.
@@ -27,6 +27,7 @@ export function registerAnalyticsContextProvider( | |||
cluster_name: { type: 'keyword', _meta: { description: 'The Cluster Name' } }, | |||
cluster_uuid: { type: 'keyword', _meta: { description: 'The Cluster UUID' } }, | |||
cluster_version: { type: 'keyword', _meta: { description: 'The Cluster version' } }, | |||
cluster_build_flavor: { type: 'keyword', _meta: { description: 'The Cluster build flavor' } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(c.f previous comment)
|
||
const indexSettings = { | ||
// settings not being supported on serverless ES | ||
...(esCapabilities.serverless | ||
? {} | ||
: { | ||
// ES rule of thumb: shards should be several GB to 10's of GB, so | ||
// Kibana is unlikely to cross that limit. | ||
number_of_shards: INDEX_NUMBER_OF_SHARDS, | ||
auto_expand_replicas: INDEX_AUTO_EXPAND_REPLICAS, | ||
// Set an explicit refresh interval so that we don't inherit the | ||
// value from incorrectly configured index templates (not required | ||
// after we adopt system indices) | ||
refresh_interval: '1s', | ||
// Bump priority so that recovery happens before newer indices | ||
priority: 10, | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted the createIndex
and cloneIndex
actions to use these new capabilites.
Note that this specific work is not necessarily tested - it will be done in a follow up - the intent of the PR was to setup the feature and wire it down to the 2 migration algorithm so that we can effectively use it.
// runServerless doesn't wait until the nodes are up | ||
await waitUntilClusterReady(getServerlessESClient()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained, es.runServerless
returns as soon as the docker containers as running, which causes tests to potentially fail if they need to access ES as soon as startES
has been called.
This naive awaiting solves it.
return { | ||
getClient: getServerlessESClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, the serverless ES test harness did not expose a client as the traditional one was doing. I'm just creating a default client to localhost:9200 for now, given that apparently the ports aren't configurable when calling es.runServerless
yet. We will eventually need to adapt this if the ports become options later.
it('returns the correct capabilities', async () => { | ||
const capabilities = await getCapabilitiesFromClient(client); | ||
expect(capabilities).toEqual({ | ||
serverless: false, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing capabilities for traditional ES
// skipped because test serverless ES nodes are currently using static ports | ||
// causing parallel jest runners to fail for obvious port conflicts reasons. | ||
describe.skip('ES capabilities for serverless ES', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment. The test passes locally, but I didn't want to introduce more flakiness right now, so I'm commenting it until the serverless test utils become more mature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I am working on adding similar tests for all migration actions here: #164842
Still good to have the capabilities covered more specifically.
Pinging @elastic/kibana-core (Team:Core) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I guess the capabilities could eventually be used as a type guard to narrow the ES client types to only serverless so that we have more type safety against calling unsupported APIs.
This would also be a good mechanism for ES to communicate the full capabilities that a build flavour supports.
@@ -51,11 +122,17 @@ describe('cloneIndex', () => { | |||
}); | |||
|
|||
it('re-throws non retry-able errors', async () => { | |||
const task = setWriteBlock({ | |||
const task = cloneIndex({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 😨
.clone({ | ||
index: source, | ||
target, | ||
wait_for_active_shards: WAIT_FOR_ALL_SHARDS_TO_BE_ACTIVE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an index setting so it's not included in the latest stateless-es PR, but I found other docs stating that this option will be removed from several APIs. But given it's not blocked yet as far as I know, we can do that in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the next step is to take a closer look at the whole migration and make sure it works with the latest serverless ES builds. I didn't want to do too much in a single PR to avoid the risk of never merging
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
History
To update your PR or re-run it, just comment with: |
// skipped because test serverless ES nodes are currently using static ports | ||
// causing parallel jest runners to fail for obvious port conflicts reasons. | ||
describe.skip('ES capabilities for serverless ES', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I am working on adding similar tests for all migration actions here: #164842
Still good to have the capabilities covered more specifically.
Summary
We recently got problems because some index creation settings are rejected by stateless ES, causing the whole system to fail and Kibana to terminate.
We can't really use feature flags for this, given:
This was something that was agreed a few weeks back, but never acted upon.
Introducing ES capabilities
This PR introduces the concept of elasticsearch "capabilities".
Those capabilities are built exclusively from info coming from the ES cluster (and not by some config flag).
This first implementation simply exposes a
serverless
flag, that is populated depending on thebuild_flavor
field of theinfo
API (/
endpoint).The end goal would be to expose a real capabilities (e.g "what is supported") list instead. But ideally this would be provided by some ES API and not by us guessing what is supported depending on the build flavor, so for now, just exposing whether we're connected to a default of serverless ES will suffice.
Using it to adapt some API calls during SO migration
This PR also adapts the
createIndex
andcloneIndex
migration action to use this information and change their request against ES accordingly (removing some index creation parameters that are not supported).