Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing the concept of ES capabilities #164850

Merged
merged 18 commits into from
Aug 28, 2023

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Aug 25, 2023

Summary

We recently got problems because some index creation settings are rejected by stateless ES, causing the whole system to fail and Kibana to terminate.

We can't really use feature flags for this, given:

  1. it doesn't really make sense to use manual flags for something that strictly depend on one of our dependency's capabilities
  2. we're mixing the concept of "serverless" offering and "serverless" build. Atm we sometimes run "serverless" Kibana against traditional ES, meaning that the "serverless" info cannot be used to determine if we're connected against a default or serverless version of ES.

This was something that was agreed a few weeks back, but never acted upon.

Introducing ES capabilities

This PR introduces the concept of elasticsearch "capabilities".

Those capabilities are built exclusively from info coming from the ES cluster (and not by some config flag).

This first implementation simply exposes a serverless flag, that is populated depending on the build_flavor field of the info API (/ endpoint).

The end goal would be to expose a real capabilities (e.g "what is supported") list instead. But ideally this would be provided by some ES API and not by us guessing what is supported depending on the build flavor, so for now, just exposing whether we're connected to a default of serverless ES will suffice.

Using it to adapt some API calls during SO migration

This PR also adapts the createIndex and cloneIndex migration action to use this information and change their request against ES accordingly (removing some index creation parameters that are not supported).

@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc release_note:skip Skip the PR/issue when compiling release notes v8.11.0 Feature:elasticsearch labels Aug 25, 2023
@pgayvallet pgayvallet force-pushed the kbn-xxx-elasticsearch-capabilities branch from dd441b1 to ee1f3b8 Compare August 25, 2023 15:15
@pgayvallet pgayvallet changed the title Introduce the concept of ES capabilities and use it for SO migration Introducing the concept of ES capabilities Aug 27, 2023
Copy link
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review

Comment on lines +164 to +166
capabilities = getElasticsearchCapabilities({
clusterInfo: await firstValueFrom(this.clusterInfo$!),
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using the clusterInfo$ observable to retrieve the info required to build the capabilities, which avoids to perform an additional call to ES.

}

return {
client: this.client!,
createClient: (type, clientConfig) => this.createClusterClient(type, config, clientConfig),
getCapabilities: () => capabilities,
Copy link
Contributor Author

@pgayvallet pgayvallet Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, technically, capabilities are bound to a given cluster, so a given client. However attaching the API to ClusterClient was way more complex, and made the caching way harder too.

I doubt we're connecting Kibana to multiple clusters, especially mixed serverless/non-serverless ones anytime soon, so I just KISS.

*
* @internal
*/
export const getCapabilitiesFromClient = async (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is exposed only for integration tests (at least for now)

@@ -28,6 +29,7 @@ export function getClusterInfo$(internalClient: ElasticsearchClient): Observable
cluster_name: info.cluster_name,
cluster_uuid: info.cluster_uuid,
cluster_version: info.version.number,
cluster_build_flavor: info.version.build_flavor,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said previously, I reused the client.info call from this method to build the capabilities, which surfaced the cluster_build_flavor in a few files / snapshots. Overall, I don't think this is a problem so I didn't bother filtering / excluding it in some places.

@@ -27,6 +27,7 @@ export function registerAnalyticsContextProvider(
cluster_name: { type: 'keyword', _meta: { description: 'The Cluster Name' } },
cluster_uuid: { type: 'keyword', _meta: { description: 'The Cluster UUID' } },
cluster_version: { type: 'keyword', _meta: { description: 'The Cluster version' } },
cluster_build_flavor: { type: 'keyword', _meta: { description: 'The Cluster build flavor' } },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(c.f previous comment)

Comment on lines 82 to +98

const indexSettings = {
// settings not being supported on serverless ES
...(esCapabilities.serverless
? {}
: {
// ES rule of thumb: shards should be several GB to 10's of GB, so
// Kibana is unlikely to cross that limit.
number_of_shards: INDEX_NUMBER_OF_SHARDS,
auto_expand_replicas: INDEX_AUTO_EXPAND_REPLICAS,
// Set an explicit refresh interval so that we don't inherit the
// value from incorrectly configured index templates (not required
// after we adopt system indices)
refresh_interval: '1s',
// Bump priority so that recovery happens before newer indices
priority: 10,
}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adapted the createIndex and cloneIndex actions to use these new capabilites.

Note that this specific work is not necessarily tested - it will be done in a follow up - the intent of the PR was to setup the feature and wire it down to the 2 migration algorithm so that we can effectively use it.

Comment on lines +79 to +80
// runServerless doesn't wait until the nodes are up
await waitUntilClusterReady(getServerlessESClient());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained, es.runServerless returns as soon as the docker containers as running, which causes tests to potentially fail if they need to access ES as soon as startES has been called.

This naive awaiting solves it.

return {
getClient: getServerlessESClient,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, the serverless ES test harness did not expose a client as the traditional one was doing. I'm just creating a default client to localhost:9200 for now, given that apparently the ports aren't configurable when calling es.runServerless yet. We will eventually need to adapt this if the ports become options later.

Comment on lines +37 to +41
it('returns the correct capabilities', async () => {
const capabilities = await getCapabilitiesFromClient(client);
expect(capabilities).toEqual({
serverless: false,
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing capabilities for traditional ES

Comment on lines +16 to +18
// skipped because test serverless ES nodes are currently using static ports
// causing parallel jest runners to fail for obvious port conflicts reasons.
describe.skip('ES capabilities for serverless ES', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment. The test passes locally, but I didn't want to introduce more flakiness right now, so I'm commenting it until the serverless test utils become more mature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I am working on adding similar tests for all migration actions here: #164842

Still good to have the capabilities covered more specifically.

@pgayvallet pgayvallet marked this pull request as ready for review August 27, 2023 15:47
@pgayvallet pgayvallet requested a review from a team as a code owner August 27, 2023 15:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Copy link
Contributor

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I guess the capabilities could eventually be used as a type guard to narrow the ES client types to only serverless so that we have more type safety against calling unsupported APIs.

This would also be a good mechanism for ES to communicate the full capabilities that a build flavour supports.

@@ -51,11 +122,17 @@ describe('cloneIndex', () => {
});

it('re-throws non retry-able errors', async () => {
const task = setWriteBlock({
const task = cloneIndex({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 😨

.clone({
index: source,
target,
wait_for_active_shards: WAIT_FOR_ALL_SHARDS_TO_BE_ACTIVE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an index setting so it's not included in the latest stateless-es PR, but I found other docs stating that this option will be removed from several APIs. But given it's not blocked yet as far as I know, we can do that in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the next step is to take a closer look at the whole migration and make sure it works with the latest serverless ES builds. I didn't want to do too much in a single PR to avoid the risk of never merging

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/core-elasticsearch-server-internal 33 34 +1
@kbn/core-elasticsearch-server-mocks 15 13 -2
@kbn/core-saved-objects-migration-server-internal 90 91 +1
total -0
Unknown metric groups

API count

id before after diff
@kbn/core-elasticsearch-server 108 111 +3
@kbn/core-elasticsearch-server-internal 37 38 +1
@kbn/core-elasticsearch-server-mocks 15 13 -2
@kbn/core-saved-objects-migration-server-internal 124 125 +1
total +3

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit 53173f1 into elastic:main Aug 28, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Aug 28, 2023
Comment on lines +16 to +18
// skipped because test serverless ES nodes are currently using static ports
// causing parallel jest runners to fail for obvious port conflicts reasons.
describe.skip('ES capabilities for serverless ES', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I am working on adding similar tests for all migration actions here: #164842

Still good to have the capabilities covered more specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:elasticsearch release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants