ElasticsearchClient should allow plugins to opt out of error logging #84447

rudolf · 2020-11-27T14:15:46Z

Whenever Elasticsearch responds with an error, the ElasticsearchClient automatically logs an error-level log message. However there are a lot of domains where an error response from Elasticsearch is an expected and valid response and not an error condition.

kibana/src/core/server/elasticsearch/client/configure_client.ts

Lines 39 to 60 in 274d5de

    
           const addLogging = (client: Client, logger: Logger, logQueries: boolean) => { 
        
             client.on('response', (error, event) => { 
        
               if (error) { 
        
                 const errorMessage = 
        
                   // error details for response errors provided by elasticsearch, defaults to error name/message 
        
                   `[${event.body?.error?.type ?? error.name}]: ${event.body?.error?.reason ?? error.message}`; 
        
                 logger.error(errorMessage); 
        
               } 
        
               if (event && logQueries) { 
        
                 const params = event.meta.request.params; 
        
                 // definition is wrong, `params.querystring` can be either a string or an object 
        
                 const querystring = convertQueryString(params.querystring); 
        
                 const url = `${params.path}${querystring ? `?${querystring}` : ''}`; 
        
                 const body = params.body ? `\n${ensureString(params.body)}` : ''; 
        
                 logger.debug(`${event.statusCode}\n${params.method} ${url}${body}`, { 
        
                   tags: ['query'], 
        
                 }); 
        
               } 
        
             }); 
        
           };

Saved Object migrations

The migration algorithm introduced in v6.5 attempts to create an index which leads to a index_already_exists_exception which is an expected error meaning another Kibana instance already acquired the lock. The new migration algorithm slated for v7.11 ensures that all steps in the migration are idempotent meaning a successful migration can have several exceptions like index_already_exists_exception, version_conflict_engine_exception or index_not_found_exception all of which can safely be ignored. If the logs are filled with many error-level messages users would not be confident that their data was safely upgraded.

For example:

Task manager ensureScheduled

https://github.com/elastic/kibana/blob/master/x-pack/plugins/task_manager/server/task_scheduling.ts#L95-L113

Many plugins use task manager's ensureScheduled at startup to ensure that a usage collection task was registered for their plugin. The task manager will then use the saved objects repository to create this task using the Elasticsearch index API. Because an index operation fails with a version_conflict_engine_exception when an object already exists there are several entries in the Kibana startup logs like:

log   [14:57:04.099] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Actions-actions_telemetry]: version conflict, document already exists (current version [4])
  log   [14:57:04.100] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerting-alerting_health_check]: version conflict, document already exists (current version [4])
  log   [14:57:04.100] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerts-alerts_invalidate_api_keys]: version conflict, document already exists (current version [4])
  log   [14:57:04.101] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:Alerting-alerting_telemetry]: version conflict, document already exists (current version [4])
  log   [14:57:04.107] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:apm-telemetry-task]: version conflict, document already exists (current version [4])
  log   [14:57:04.107] [error][data][elasticsearch] [version_conflict_engine_exception]: [task:endpoint:user-artifact-packager:1.0.0]: version conflict, document already exists (current version [4])

These "error" messages confuse users who rightly think this is something that needs to be investigated. However, this is completely expected behaviour and the task manager already catches and ignores these errors.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-27T14:15:48Z

Pinging @elastic/kibana-core (Team:Core)

rudolf · 2020-11-27T14:32:53Z

@elastic/kibana-core @pgayvallet Since any error will always bubble up to some error handler I wonder if we shouldn't log these errors as debug-level only?

When an exception returned from a route handler is an ElasticsearchClientError we could log a message in the more human readable format instead of logging the entire error object https://github.com/elastic/kibana/blob/master/src/core/server/http/router/router.ts#L271-L275

pgayvallet · 2020-11-30T11:23:13Z

This was kinda migrated as it was from the legacy client. However I agree that we would ideally add a way to opt-out from this automatic error logging.

Main challenge here is that we are technically always creating clients via the child API

kibana/src/core/server/elasticsearch/client/cluster_client.ts

Lines 82 to 84 in 1ad68fd

    
           const scopedClient = this.rootScopedClient.child({ 
        
             headers: scopedHeaders, 
        
           });

and that the logging mechanism is done at the root/parent client level, and we don't have any information on the actual child emitting the response event.

kibana/src/core/server/elasticsearch/client/configure_client.ts

Lines 27 to 34 in 274d5de

    
           export const configureClient = ( 
        
             config: ElasticsearchClientConfig, 
        
             { logger, scoped = false }: { logger: Logger; scoped?: boolean } 
        
           ): Client => { 
        
             const clientOptions = parseClientOptions(config, scoped); 
        
             const client = new Client(clientOptions); 
        
             addLogging(client, logger, config.logQueries);

So maybe the proposed solution to lower the log level is the most pragmatic one.

Since any error will always bubble up to some error handler I wonder if we shouldn't log these errors as debug-level only?

The argument that the errors is going to be catched by some handler anyway makes sense. Maybe we could (in addition) also only log them when logQueries is true.

When an exception returned from a route handler is an ElasticsearchClientError we could log a message in the more human readable format instead of logging the entire error object

We kinda were hoping to be able to remove the ES 401 error special handling in the routing layer at some point, so this would not really go in the right direction. What do you think about that @restrry?

rudolf · 2020-11-30T11:40:47Z

The argument that the errors is going to be catched by some handler anyway makes sense. Maybe we could (in addition) also only log them when logQueries is true.

Yeah I like that idea.

so this would not really go in the right direction

Yeah this does mix up the two domains and encourages plugins to just throw Errors instead of catching it in their route handler and logging a human readable error there. Preferably such an error would then be linked to the plugin's domain like logger.error("Couldn't create dashboard", error) instead of a generic Elasticsearch error without any context that would help users understand the impact of the error.

pgayvallet · 2020-11-30T12:23:32Z

Yea, I'd say that letting the consumers 'fully' handle how they decide to log / handle the error is probably the way to go (if that's acceptable, I don't think changing that now would be considered a breaking change?). We could also expose a helper to properly format ES errors when logging them.

mshustov · 2020-12-10T11:17:07Z

This was kinda migrated as it was from the legacy client.

I assumed so, but when I started working on #73672 I found out that it's not true.
Given you example for version_conflict_engine_exception

the new ES client logs an error.
the legacy ES client doesn't log it by default. The only way to see it is to enable logQueries:true

 server    log   [12:09:46.611] [debug][data][elasticsearch][query] 409
 POST /36496a20-3ad8-11eb-b3a5-bdd3b2e67dba/_create/36496a20-3ad8-11eb-b3a5-bdd3b2e67dba
 {}

I'm going to adjust the new ES client logging to be compatible with the legacy one.

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Nov 27, 2020

rudolf mentioned this issue Nov 27, 2020

Implement SavedObject Migrations v2 #75780

Closed

17 tasks

kobelb mentioned this issue Nov 30, 2020

Handled SavedObjectsClient errors logged at the error level #84545

Closed

joshdover assigned mshustov Dec 1, 2020

mshustov mentioned this issue Dec 10, 2020

[KP] log details of the failed elasticsearch client requests when logQueries:true #73672

Merged

2 tasks

mshustov closed this as completed in #73672 Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElasticsearchClient should allow plugins to opt out of error logging #84447

ElasticsearchClient should allow plugins to opt out of error logging #84447

rudolf commented Nov 27, 2020

elasticmachine commented Nov 27, 2020

rudolf commented Nov 27, 2020

pgayvallet commented Nov 30, 2020

rudolf commented Nov 30, 2020

pgayvallet commented Nov 30, 2020

mshustov commented Dec 10, 2020

ElasticsearchClient should allow plugins to opt out of error logging #84447

ElasticsearchClient should allow plugins to opt out of error logging #84447

Comments

rudolf commented Nov 27, 2020

Saved Object migrations

For example:

Task manager ensureScheduled

elasticmachine commented Nov 27, 2020

rudolf commented Nov 27, 2020

pgayvallet commented Nov 30, 2020

rudolf commented Nov 30, 2020

pgayvallet commented Nov 30, 2020

mshustov commented Dec 10, 2020