[Stack Monitoring] Add metricbeat errors to Health API response #137288

crespocarlos · 2022-07-27T14:00:12Z

Summary

This PR closes #135692 adding metricbeatErrors object to the new Health API response, considering error events captured by metricbeat on Elasticsearch, Logstash, Kibana, Beat and EnterpriseSearch

Screenshot

How to test

Start local ES and kibana
Start metricbeat with xpack modules, monitoring a stack component - in this case Logstash - while it is not running.

metricbeat.modules:
  - module: logstash
    xpack.enabled: true
    period: 10s
    hosts: [ "localhost:9600" ]

output.elasticsearch:
  hosts: [ "localhost:9200" ]
  username: "elastic"
  password: "changeme"
  allow_older_versions: true

Hit api/monitoring/v1/_health endpoint

crespocarlos · 2022-07-28T12:56:17Z

@elasticmachine merge upstream

elasticmachine · 2022-07-28T14:05:18Z

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

klacabane · 2022-07-29T13:40:58Z

x-pack/test/api_integration/apis/monitoring/es_archives/_health/metricbeat_8/mappings.json

@@ -0,0 +1,374 @@
+{
+  "type": "index",


I would expect metricbeat-8.x.x to be a datastream, did we generate this archive with the esArchiver ?

Honestly, I just realized there was such tool after your comment. I just ran it and it did generate a mappings.json with "type": "data_stream". I'll update this.

I think I have some documentation stashed about it, I'll try to revive it

klacabane · 2022-07-29T13:48:22Z

x-pack/plugins/monitoring/server/routes/api/v1/_health/metricbeat/fetch_metricbeat_errors.ts

+      execution: { timedOut: false, errors: [] },
+    };
+  } catch (err) {
+    logger.error(`fetchMonitoredClusters: failed to fetch:\n${err.stack}`);


Should the search query be included in the try block ?

Also fetchMonitoredClusters -> fetchMetricbeatErrors

totally! It's the most important part

klacabane · 2022-07-29T13:49:06Z

x-pack/plugins/monitoring/server/routes/api/v1/_health/metricbeat/fetch_metricbeat_errors.ts

+  logger,
+}: FetchParameters & {
+  metricbeatIndex: string;
+}): Promise<MetricbeatResponse | null> => {


what's the case where we return null ?

I forgot to remove it. thanks

klacabane · 2022-07-29T13:54:33Z

x-pack/plugins/monitoring/server/routes/api/v1/_health/metricbeat/build_metricbeat_errors.ts

+const buildErrorMessages = (errorDocs: any[]): ErrorDetails[] => {
+  const seenErrorMessages = new Set<string>();
+
+  return errorDocs


nit: a for loop or reduce could be more concise and readable

crespocarlos · 2022-08-01T07:57:47Z

@elasticmachine merge upstream

klacabane · 2022-08-01T21:36:22Z

x-pack/test/api_integration/apis/monitoring/data_stream.ts

      await esArchiver.load(archive, { useCreate: true });
    },

    async tearDown() {
+      await deleteDataStream('metricbeat-*');


The fact that we have to delete the datastream instead of calling esArchiver.unload(archive) is because the monitoring mappings are already installed by elasticsearch and we don't need to set them up, thus the archiver don't have any reference to the template and can't automatically delete it.

This weirdness makes me think that we should maybe install an archived version of the mappings for .monitoring-{product}-mb to have a standardized usage of the archiver. we can discuss that in #119658

In the meantime we could still unload the metricbeat archive for a complete cleanup of the assets. I'm wondering if we add a call to esArchiver.unload(archive) here, does it fail for the monitoring archive or is that a noop because the mappings file does not exist

I think I had the unload here running before the deleteDataStream. I'll put it back and see the behaviour. I don't remember if all the data got removed when I deleted the datastream.

Apparently deleting a datastream also deletes its indexes https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-data-stream.html, which makes sense, because I've run test test multiple times in a row and it never failed due to duplicate ids

If we only delete the data stream I think we'll leave the backing index template/component templates of the ds

True! The index template stays there if we don't unload the archive. I'll add that.

klacabane

LGTM

crespocarlos · 2022-08-03T09:35:09Z

@elasticmachine merge upstream

klacabane · 2022-08-03T09:52:09Z

x-pack/test/api_integration/apis/monitoring/data_stream.ts

+      const archivesArray = Array.isArray(archives) ? archives : [archives];
+      await Promise.all(archivesArray.map((archive) => esArchiver.unload(archive)));
+
+      await deleteDataStream('metricbeat-*');


I think we don't need to delete the metricbeat-* ds anymore as the archive unload will take care of that. Can we also leave a small comment describing the .monitoring-* specificity ?

done :)! I finally know now how the es-archiver works

…or of archiver unload

crespocarlos · 2022-08-03T11:06:11Z

buildkite test this

crespocarlos · 2022-08-03T11:07:36Z

@elasticmachine merge upstream

kibana-ci · 2022-08-03T12:16:28Z

💚 Build Succeeded

Buildkite Build
Commit: 441983c

Metrics [docs]

✅ unchanged

History

💔 Build #62717 failed bf248b4
💚 Build #62338 succeeded 81c7a2b
💔 Build #62312 failed fe0a940
💚 Build #61929 succeeded dcb30de
💚 Build #61860 succeeded 86e12e4
💔 Build #61789 failed c768d0f

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…tic#137288) * Add metricbeat erros to Health API response * Fix unit test * Add integration test scenario * Small fix * Small fixes and improving integration test data * Small refactor of fetchMetricbeatErrors * Add logging * Unload metricbeat archive after test finishes up * Fix data_stream setup function * Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 62ce378)

kibanamachine · 2022-08-03T12:34:07Z

💚 All backports created successfully

Status	Branch	Result
✅	8.4

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

) (#137975) * Add metricbeat erros to Health API response * Fix unit test * Add integration test scenario * Small fix * Small fixes and improving integration test data * Small refactor of fetchMetricbeatErrors * Add logging * Unload metricbeat archive after test finishes up * Fix data_stream setup function * Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 62ce378) Co-authored-by: Carlos Crespo <crespocarlos@users.noreply.github.com>

crespocarlos added 3 commits July 27, 2022 15:50

Add metricbeat erros to Health API response

8f488e4

Fix unit test

0d14b3b

Add integration test scenario

cf4ab39

crespocarlos changed the title ~~Add metricbeat erros to Health API response~~ [Stack Monitoring] Add metricbeat erros to Health API response Jul 28, 2022

Small fix

7080fe5

Merge branch 'main' into 135692-metricbeat-errors-in-health-api

74ea275

crespocarlos marked this pull request as ready for review July 28, 2022 14:05

crespocarlos requested a review from a team as a code owner July 28, 2022 14:05

matschaffer changed the title ~~[Stack Monitoring] Add metricbeat erros to Health API response~~ [Stack Monitoring] Add metricbeat errors to Health API response Jul 29, 2022

klacabane reviewed Jul 29, 2022

View reviewed changes

crespocarlos added 3 commits July 29, 2022 18:30

Small fixes and improving integration test data

c768d0f

Small refactor of fetchMetricbeatErrors

a2b951c

Add logging

86e12e4

Merge branch 'main' into 135692-metricbeat-errors-in-health-api

dcb30de

klacabane reviewed Aug 1, 2022

View reviewed changes

klacabane approved these changes Aug 1, 2022

View reviewed changes

crespocarlos added 2 commits August 2, 2022 12:21

Unload metricbeat archive after test finishes up

fe0a940

Fix data_stream setup function

81c7a2b

Merge branch 'main' into 135692-metricbeat-errors-in-health-api

4cb0189

klacabane reviewed Aug 3, 2022

View reviewed changes

Remove manual metricbeat data stream deletion in test teardown in fav…

bf248b4

…or of archiver unload

Merge branch 'main' into 135692-metricbeat-errors-in-health-api

441983c

crespocarlos merged commit 62ce378 into elastic:main Aug 3, 2022

crespocarlos deleted the 135692-metricbeat-errors-in-health-api branch August 3, 2022 12:30

kibanamachine mentioned this pull request Aug 3, 2022

[8.4] [Stack Monitoring] Add metricbeat errors to Health API response (#137288) #137975

Merged

kibanamachine added the v8.4.0 label Aug 3, 2022

This was referenced Aug 4, 2022

[Observability][SecuritySolution] Fix to prevent observability style conflict in flyout across plugins #138091

Merged

[SecuritySolution][Bug] Fix to add empty values to timeline #138510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] Add metricbeat errors to Health API response #137288

[Stack Monitoring] Add metricbeat errors to Health API response #137288

crespocarlos commented Jul 27, 2022 •

edited by kibanamachine

Loading

crespocarlos commented Jul 28, 2022

elasticmachine commented Jul 28, 2022

klacabane Jul 29, 2022

crespocarlos Jul 29, 2022 •

edited

Loading

klacabane Jul 29, 2022

klacabane Jul 29, 2022

crespocarlos Jul 29, 2022 •

edited

Loading

klacabane Jul 29, 2022

crespocarlos Jul 29, 2022

klacabane Jul 29, 2022

crespocarlos commented Aug 1, 2022

klacabane Aug 1, 2022 •

edited

Loading

crespocarlos Aug 2, 2022

crespocarlos Aug 2, 2022

klacabane Aug 2, 2022

crespocarlos Aug 2, 2022

klacabane left a comment

crespocarlos commented Aug 3, 2022

klacabane Aug 3, 2022

crespocarlos Aug 3, 2022

crespocarlos commented Aug 3, 2022

crespocarlos commented Aug 3, 2022

kibana-ci commented Aug 3, 2022

kibanamachine commented Aug 3, 2022

[Stack Monitoring] Add metricbeat errors to Health API response #137288

[Stack Monitoring] Add metricbeat errors to Health API response #137288

Conversation

crespocarlos commented Jul 27, 2022 • edited by kibanamachine Loading

Summary

Screenshot

How to test

crespocarlos commented Jul 28, 2022

elasticmachine commented Jul 28, 2022

Choose a reason for hiding this comment

crespocarlos Jul 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crespocarlos Jul 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crespocarlos commented Aug 1, 2022

klacabane Aug 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klacabane left a comment

Choose a reason for hiding this comment

crespocarlos commented Aug 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crespocarlos commented Aug 3, 2022

crespocarlos commented Aug 3, 2022

kibana-ci commented Aug 3, 2022

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Aug 3, 2022

💚 All backports created successfully

Questions ?

crespocarlos commented Jul 27, 2022 •

edited by kibanamachine

Loading

crespocarlos Jul 29, 2022 •

edited

Loading

crespocarlos Jul 29, 2022 •

edited

Loading

klacabane Aug 1, 2022 •

edited

Loading