-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate unhealthy reason (input/output/other) on agent documents #3338
Conversation
internal/pkg/api/handleCheckin.go
Outdated
hasUnhealthyInput := false | ||
hasUnhealthyOutput := false | ||
hasUnhealthyComponent := false | ||
reqComponentsArray, ok := reqComponents.([]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at adding schema on agent.components
instead of all this parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the schema and it simplifies the code by a lot.
It needs more testing though, I'm not sure what happens if there are other (unmapped) properties in components, we should probably keep them for debug purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested with a healthy endpoint input, that extra properties are not being removed after adding the schema:
"components": [
{
"id": "endpoint-default",
"type": "endpoint",
"status": "HEALTHY",
"message": "Healthy: communicating with endpoint service",
"units": [
{
"id": "endpoint-default",
"type": "output",
"status": "HEALTHY",
"message": "Applied policy {e14510ab-83b8-4c40-af40-519ea5203adf}",
"payload": {
"error": {
"code": 0,
"message": "Success"
}
}
},
buildkite run perf-tests |
} | ||
|
||
var outComponents []byte | ||
|
||
// Compare the deserialized meta structures and return the bytes to update if different | ||
if !reflect.DeepEqual(reqComponents, agentComponents) { | ||
if !reflect.DeepEqual(reqComponents, agent.Components) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have already a test that test we do not update if not different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, I'll take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests to verify when components equals / not equals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Quality Gate passedThe SonarQube Quality Gate passed, but some issues were introduced. 2 New issues |
…178605) ## Summary Closes elastic/ingest-dev#2522 Added `unhealthy_reason` aggregation when querying agent metrics. The [mapping change](elastic/elasticsearch#106246) and [fleet-server change](elastic/fleet-server#3338) is needed to be merged first to verify end to end. Steps to verify: - enroll an agent with docker - add endpoint integration, expect an input and output unit error status on the agent doc - wait a few seconds so that the agent metrics are published - verify that the agent metrics include `unhealthy_reason`, using the query below ``` GET metrics-fleet_server.agent_status-default/_search { "_source": ["fleet.agents"] } "hits": [ { "_index": ".ds-metrics-fleet_server.agent_status-default-2024.03.11-000001", "_id": "3JdPioUh-9j8DxQrAAABjjclRhU", "_score": 1, "_source": { "fleet": { "agents": { "enrolled": 12, "healthy": 0, "inactive": 0, "offline": 11, "total": 13, "unenrolled": 1, "unhealthy": 1, "updating": 0, "upgrading_step": { "downloading": 0, "extracting": 0, "failed": 0, "replacing": 0, "requested": 0, "restarting": 0, "rollback": 0, "scheduled": 0, "watching": 0 }, "unhealthy_reason": { "input": 1, "output": 1 } } } } }, ``` ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
elasticsearch pr to add
unhealthy_reason
keyword mapping: elastic/elasticsearch#106246What is the problem this PR solves?
Add
unhealthy_reason
to fleet server metrics published regularly.How does this PR solve the problem?
Calculate
unhealthy_reason
fromagent.components
on checkin and save in agent doc.How to test this PR locally
unhealthy_reason
is added to the agent docDesign Checklist
Checklist
./changelog/fragments
using the changelog toolRelated issues
Relates https://github.com/elastic/ingest-dev/issues/2522