-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] Add metricbeat errors to Health API response #137288
[Stack Monitoring] Add metricbeat errors to Health API response #137288
Conversation
@elasticmachine merge upstream |
Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI) |
@@ -0,0 +1,374 @@ | |||
{ | |||
"type": "index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect metricbeat-8.x.x
to be a datastream, did we generate this archive with the esArchiver ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I just realized there was such tool after your comment. I just ran it and it did generate a mappings.json with "type": "data_stream"
. I'll update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have some documentation stashed about it, I'll try to revive it
execution: { timedOut: false, errors: [] }, | ||
}; | ||
} catch (err) { | ||
logger.error(`fetchMonitoredClusters: failed to fetch:\n${err.stack}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the search query be included in the try block ?
Also fetchMonitoredClusters
-> fetchMetricbeatErrors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally! It's the most important part
logger, | ||
}: FetchParameters & { | ||
metricbeatIndex: string; | ||
}): Promise<MetricbeatResponse | null> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the case where we return null ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to remove it. thanks
const buildErrorMessages = (errorDocs: any[]): ErrorDetails[] => { | ||
const seenErrorMessages = new Set<string>(); | ||
|
||
return errorDocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a for loop or reduce could be more concise and readable
@elasticmachine merge upstream |
await esArchiver.load(archive, { useCreate: true }); | ||
}, | ||
|
||
async tearDown() { | ||
await deleteDataStream('metricbeat-*'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we have to delete the datastream instead of calling esArchiver.unload(archive)
is because the monitoring mappings are already installed by elasticsearch and we don't need to set them up, thus the archiver don't have any reference to the template and can't automatically delete it.
This weirdness makes me think that we should maybe install an archived version of the mappings for .monitoring-{product}-mb
to have a standardized usage of the archiver. we can discuss that in #119658
In the meantime we could still unload the metricbeat archive for a complete cleanup of the assets. I'm wondering if we add a call to esArchiver.unload(archive)
here, does it fail for the monitoring archive or is that a noop because the mappings file does not exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I had the unload here running before the deleteDataStream
. I'll put it back and see the behaviour. I don't remember if all the data got removed when I deleted the datastream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently deleting a datastream also deletes its indexes https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-data-stream.html, which makes sense, because I've run test test multiple times in a row and it never failed due to duplicate ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only delete the data stream I think we'll leave the backing index template/component templates of the ds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! The index template stays there if we don't unload the archive. I'll add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
const archivesArray = Array.isArray(archives) ? archives : [archives]; | ||
await Promise.all(archivesArray.map((archive) => esArchiver.unload(archive))); | ||
|
||
await deleteDataStream('metricbeat-*'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to delete the metricbeat-*
ds anymore as the archive unload will take care of that. Can we also leave a small comment describing the .monitoring-*
specificity ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)! I finally know now how the es-archiver works
…or of archiver unload
buildkite test this |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
…tic#137288) * Add metricbeat erros to Health API response * Fix unit test * Add integration test scenario * Small fix * Small fixes and improving integration test data * Small refactor of fetchMetricbeatErrors * Add logging * Unload metricbeat archive after test finishes up * Fix data_stream setup function * Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 62ce378)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
) (#137975) * Add metricbeat erros to Health API response * Fix unit test * Add integration test scenario * Small fix * Small fixes and improving integration test data * Small refactor of fetchMetricbeatErrors * Add logging * Unload metricbeat archive after test finishes up * Fix data_stream setup function * Remove manual metricbeat data stream deletion in test teardown in favor of archiver unload Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 62ce378) Co-authored-by: Carlos Crespo <crespocarlos@users.noreply.github.com>
Summary
This PR closes #135692 adding
metricbeatErrors
object to the new Health API response, considering error events captured by metricbeat on Elasticsearch, Logstash, Kibana, Beat and EnterpriseSearchScreenshot
How to test
api/monitoring/v1/_health
endpoint