Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticsearchClient should allow plugins to opt out of error logging #84447

Closed
rudolf opened this issue Nov 27, 2020 · 6 comments · Fixed by #73672
Closed

ElasticsearchClient should allow plugins to opt out of error logging #84447

rudolf opened this issue Nov 27, 2020 · 6 comments · Fixed by #73672
Assignees
Labels
project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@rudolf
Copy link
Contributor

rudolf commented Nov 27, 2020

Whenever Elasticsearch responds with an error, the ElasticsearchClient automatically logs an error-level log message. However there are a lot of domains where an error response from Elasticsearch is an expected and valid response and not an error condition.

const addLogging = (client: Client, logger: Logger, logQueries: boolean) => {
client.on('response', (error, event) => {
if (error) {
const errorMessage =
// error details for response errors provided by elasticsearch, defaults to error name/message
`[${event.body?.error?.type ?? error.name}]: ${event.body?.error?.reason ?? error.message}`;
logger.error(errorMessage);
}
if (event && logQueries) {
const params = event.meta.request.params;
// definition is wrong, `params.querystring` can be either a string or an object
const querystring = convertQueryString(params.querystring);
const url = `${params.path}${querystring ? `?${querystring}` : ''}`;
const body = params.body ? `\n${ensureString(params.body)}` : '';
logger.debug(`${event.statusCode}\n${params.method} ${url}${body}`, {
tags: ['query'],
});
}
});
};

Saved Object migrations

The migration algorithm introduced in v6.5 attempts to create an index which leads to a index_already_exists_exception which is an expected error meaning another Kibana instance already acquired the lock. The new migration algorithm slated for v7.11 ensures that all steps in the migration are idempotent meaning a successful migration can have several exceptions like index_already_exists_exception, version_conflict_engine_exception or index_not_found_exception all of which can safely be ignored. If the logs are filled with many error-level messages users would not be confident that their data was safely upgraded.

For example:

Task manager ensureScheduled

https://github.com/elastic/kibana/blob/master/x-pack/plugins/task_manager/server/task_scheduling.ts#L95-L113

Many plugins use task manager's ensureScheduled at startup to ensure that a usage collection task was registered for their plugin. The task manager will then use the saved objects repository to create this task using the Elasticsearch index API. Because an index operation fails with a version_conflict_engine_exception when an object already exists there are several entries in the Kibana startup logs like:

log   [14:57:04.099] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Actions-actions_telemetry]: version conflict, document already exists (current version [4])
  log   [14:57:04.100] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerting-alerting_health_check]: version conflict, document already exists (current version [4])
  log   [14:57:04.100] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerts-alerts_invalidate_api_keys]: version conflict, document already exists (current version [4])
  log   [14:57:04.101] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerting-alerting_telemetry]: version conflict, document already exists (current version [4])
  log   [14:57:04.107] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:apm-telemetry-task]: version conflict, document already exists (current version [4])
  log   [14:57:04.107] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:endpoint:user-artifact-packager:1.0.0]: version conflict, document already exists (current version [4])

These "error" messages confuse users who rightly think this is something that needs to be investigated. However, this is completely expected behaviour and the task manager already catches and ignores these errors.

@rudolf rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Nov 27, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@rudolf
Copy link
Contributor Author

rudolf commented Nov 27, 2020

@elastic/kibana-core @pgayvallet Since any error will always bubble up to some error handler I wonder if we shouldn't log these errors as debug-level only?

When an exception returned from a route handler is an ElasticsearchClientError we could log a message in the more human readable format instead of logging the entire error object https://github.com/elastic/kibana/blob/master/src/core/server/http/router/router.ts#L271-L275

@pgayvallet
Copy link
Contributor

This was kinda migrated as it was from the legacy client. However I agree that we would ideally add a way to opt-out from this automatic error logging.

Main challenge here is that we are technically always creating clients via the child API

const scopedClient = this.rootScopedClient.child({
headers: scopedHeaders,
});

and that the logging mechanism is done at the root/parent client level, and we don't have any information on the actual child emitting the response event.

export const configureClient = (
config: ElasticsearchClientConfig,
{ logger, scoped = false }: { logger: Logger; scoped?: boolean }
): Client => {
const clientOptions = parseClientOptions(config, scoped);
const client = new Client(clientOptions);
addLogging(client, logger, config.logQueries);

So maybe the proposed solution to lower the log level is the most pragmatic one.

Since any error will always bubble up to some error handler I wonder if we shouldn't log these errors as debug-level only?

The argument that the errors is going to be catched by some handler anyway makes sense. Maybe we could (in addition) also only log them when logQueries is true.

When an exception returned from a route handler is an ElasticsearchClientError we could log a message in the more human readable format instead of logging the entire error object

We kinda were hoping to be able to remove the ES 401 error special handling in the routing layer at some point, so this would not really go in the right direction. What do you think about that @restrry?

@rudolf
Copy link
Contributor Author

rudolf commented Nov 30, 2020

The argument that the errors is going to be catched by some handler anyway makes sense. Maybe we could (in addition) also only log them when logQueries is true.

Yeah I like that idea.

so this would not really go in the right direction

Yeah this does mix up the two domains and encourages plugins to just throw Errors instead of catching it in their route handler and logging a human readable error there. Preferably such an error would then be linked to the plugin's domain like logger.error("Couldn't create dashboard", error) instead of a generic Elasticsearch error without any context that would help users understand the impact of the error.

@pgayvallet
Copy link
Contributor

Yea, I'd say that letting the consumers 'fully' handle how they decide to log / handle the error is probably the way to go (if that's acceptable, I don't think changing that now would be considered a breaking change?). We could also expose a helper to properly format ES errors when logging them.

@mshustov
Copy link
Contributor

This was kinda migrated as it was from the legacy client.

I assumed so, but when I started working on #73672 I found out that it's not true.
Given you example for version_conflict_engine_exception

  • the new ES client logs an error.
  • the legacy ES client doesn't log it by default. The only way to see it is to enable logQueries:true
 server    log   [12:09:46.611] [debug][data][elasticsearch][query] 409
 POST /36496a20-3ad8-11eb-b3a5-bdd3b2e67dba/_create/36496a20-3ad8-11eb-b3a5-bdd3b2e67dba
 {}

I'm going to adjust the new ES client logging to be compatible with the legacy one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants