Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Add metricbeat errors to Health API response #137288

Merged

Conversation

crespocarlos
Copy link
Contributor

@crespocarlos crespocarlos commented Jul 27, 2022

Summary

This PR closes #135692 adding metricbeatErrors object to the new Health API response, considering error events captured by metricbeat on Elasticsearch, Logstash, Kibana, Beat and EnterpriseSearch

Screenshot

image

How to test

  • Start local ES and kibana
  • Start metricbeat with xpack modules, monitoring a stack component - in this case Logstash - while it is not running.
metricbeat.modules:
  - module: logstash
    xpack.enabled: true
    period: 10s
    hosts: [ "localhost:9600" ]

output.elasticsearch:
  hosts: [ "localhost:9200" ]
  username: "elastic"
  password: "changeme"
  allow_older_versions: true
  • Hit api/monitoring/v1/_health endpoint

@crespocarlos crespocarlos changed the title Add metricbeat erros to Health API response [Stack Monitoring] Add metricbeat erros to Health API response Jul 28, 2022
@crespocarlos crespocarlos added Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services release_note:skip Skip the PR/issue when compiling release notes v8.5.0 auto-backport Deprecated - use backport:version if exact versions are needed labels Jul 28, 2022
@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos marked this pull request as ready for review July 28, 2022 14:05
@crespocarlos crespocarlos requested a review from a team as a code owner July 28, 2022 14:05
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@matschaffer matschaffer changed the title [Stack Monitoring] Add metricbeat erros to Health API response [Stack Monitoring] Add metricbeat errors to Health API response Jul 29, 2022
@@ -0,0 +1,374 @@
{
"type": "index",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect metricbeat-8.x.x to be a datastream, did we generate this archive with the esArchiver ?

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I just realized there was such tool after your comment. I just ran it and it did generate a mappings.json with "type": "data_stream". I'll update this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have some documentation stashed about it, I'll try to revive it

execution: { timedOut: false, errors: [] },
};
} catch (err) {
logger.error(`fetchMonitoredClusters: failed to fetch:\n${err.stack}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the search query be included in the try block ?

Also fetchMonitoredClusters -> fetchMetricbeatErrors

Copy link
Contributor Author

@crespocarlos crespocarlos Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally! It's the most important part

logger,
}: FetchParameters & {
metricbeatIndex: string;
}): Promise<MetricbeatResponse | null> => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the case where we return null ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to remove it. thanks

const buildErrorMessages = (errorDocs: any[]): ErrorDetails[] => {
const seenErrorMessages = new Set<string>();

return errorDocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a for loop or reduce could be more concise and readable

@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

await esArchiver.load(archive, { useCreate: true });
},

async tearDown() {
await deleteDataStream('metricbeat-*');
Copy link
Contributor

@klacabane klacabane Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we have to delete the datastream instead of calling esArchiver.unload(archive) is because the monitoring mappings are already installed by elasticsearch and we don't need to set them up, thus the archiver don't have any reference to the template and can't automatically delete it.

This weirdness makes me think that we should maybe install an archived version of the mappings for .monitoring-{product}-mb to have a standardized usage of the archiver. we can discuss that in #119658

In the meantime we could still unload the metricbeat archive for a complete cleanup of the assets. I'm wondering if we add a call to esArchiver.unload(archive) here, does it fail for the monitoring archive or is that a noop because the mappings file does not exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had the unload here running before the deleteDataStream. I'll put it back and see the behaviour. I don't remember if all the data got removed when I deleted the datastream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently deleting a datastream also deletes its indexes https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-data-stream.html, which makes sense, because I've run test test multiple times in a row and it never failed due to duplicate ids

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only delete the data stream I think we'll leave the backing index template/component templates of the ds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! The index template stays there if we don't unload the archive. I'll add that.

Copy link
Contributor

@klacabane klacabane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

const archivesArray = Array.isArray(archives) ? archives : [archives];
await Promise.all(archivesArray.map((archive) => esArchiver.unload(archive)));

await deleteDataStream('metricbeat-*');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to delete the metricbeat-* ds anymore as the archive unload will take care of that. Can we also leave a small comment describing the .monitoring-* specificity ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)! I finally know now how the es-archiver works

@crespocarlos
Copy link
Contributor Author

buildkite test this

@crespocarlos
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@crespocarlos crespocarlos added backport-v8.0.0 auto-backport Deprecated - use backport:version if exact versions are needed backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) and removed auto-backport Deprecated - use backport:version if exact versions are needed backport-v8.0.0 labels Aug 3, 2022
@crespocarlos crespocarlos merged commit 62ce378 into elastic:main Aug 3, 2022
@crespocarlos crespocarlos deleted the 135692-metricbeat-errors-in-health-api branch August 3, 2022 12:30
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Aug 3, 2022
…tic#137288)

* Add metricbeat erros to Health API response

* Fix unit test

* Add integration test scenario

* Small fix

* Small fixes and improving integration test data

* Small refactor of fetchMetricbeatErrors

* Add logging

* Unload metricbeat archive after test finishes up

* Fix data_stream setup function

* Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 62ce378)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.4

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Aug 3, 2022
) (#137975)

* Add metricbeat erros to Health API response

* Fix unit test

* Add integration test scenario

* Small fix

* Small fixes and improving integration test data

* Small refactor of fetchMetricbeatErrors

* Add logging

* Unload metricbeat archive after test finishes up

* Fix data_stream setup function

* Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit 62ce378)

Co-authored-by: Carlos Crespo <crespocarlos@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Feature:Stack Monitoring release_note:skip Skip the PR/issue when compiling release notes Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.4.0 v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Stack Monitoring] Query metricbeat errors in the Health api
5 participants