Skip to content

Commit

Permalink
[8.6] Fleet Usage telemetry extension (#145353) (#146105)
Browse files Browse the repository at this point in the history
# Backport

This will backport the following commits from `main` to `8.6`:
- [Fleet Usage telemetry extension
(#145353)](#145353)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Julia
Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-11-23T09:22:20Z","message":"Fleet
Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet
to the telemetry that I added for each requirement.\r\nPlease review and
let me know if any changes are needed.\r\nAlso asked a few questions
below. @jlind23 @kpollich \r\n\r\n6. is blocked by
[elasticsearch\r\nchange](elastic/elasticsearch#91701)
to give\r\nkibana_system the missing privilege to read
logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning
from\r\nhttps://github.com//pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n-
[x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent
running: `agent.version` field on\r\n`.fleet-agents`
documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n
],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can
query for `.fleet-policies` where some `input` has
`type:\r\n'fleet-server'` for this, as well as use the `Fleet Server
Hosts`\r\nsettings that we define via saved objects in
Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\":
[\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n
\"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\":
\"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number
of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did
we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n
\"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those
policies\r\nCollecting this from ts logic, querying from
`.fleet-policies` index.\r\nThe alternative would be to write a painless
script (because the\r\n`outputs` are an object with dynamic keys, we
can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\":
{\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n
}\r\n```\r\n\r\nDid we mean to just collect the types here, or any other
info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin
failures\r\nWe only have the most recent checkin status and timestamp
on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last
checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin
status currently.\r\nOr do we mean to publish specific info for all
agents\r\n(`last_checkin_status`, `last_checkin` time,
`last_checkin_message`)?\r\nAre the only statuses `error` and `degraded`
that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\":
{\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6.
Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean
here elastic-agent logs only, or fleet-server logs as well\r\n(maybe
separately)?\r\n\r\nI found an alternative way to query the message
field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET
logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n
\"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\":
\"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n
\"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n
\"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n
\"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n
\"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n
}\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample
response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n
\"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n
\"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline
agents\",\r\n \"regex\":
\".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n
\"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n
\"key\": \"\"\"stderr panic close of closed channel n ngoroutine running
Stop ngh.neting.cc/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5
\\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go
n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past
period of time\r\n\r\nI think this is almost the same as #5. The
difference would be to report\r\nnew failures happened only in the last
hour, or report all agents in\r\nfailure state. (which would be an
increasing number if the agent stays\r\nin failed state).\r\nDo we want
these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr
query, instead added a new field to report\r\nagents enrolled per policy
(top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\":
{\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n
\"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of
Elastic Agent and number of fleet server\r\n\r\nThis is already there in
the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\":
0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n
\"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n
\"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n
\"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n
\"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n
},\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","v8.6.0","v8.7.0"],"number":145353,"url":"https://github.com/elastic/kibana/pull/145353","mergeCommit":{"message":"Fleet
Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet
to the telemetry that I added for each requirement.\r\nPlease review and
let me know if any changes are needed.\r\nAlso asked a few questions
below. @jlind23 @kpollich \r\n\r\n6. is blocked by
[elasticsearch\r\nchange](elastic/elasticsearch#91701)
to give\r\nkibana_system the missing privilege to read
logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning
from\r\nhttps://github.com//pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n-
[x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent
running: `agent.version` field on\r\n`.fleet-agents`
documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n
],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can
query for `.fleet-policies` where some `input` has
`type:\r\n'fleet-server'` for this, as well as use the `Fleet Server
Hosts`\r\nsettings that we define via saved objects in
Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\":
[\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n
\"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\":
\"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number
of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did
we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n
\"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those
policies\r\nCollecting this from ts logic, querying from
`.fleet-policies` index.\r\nThe alternative would be to write a painless
script (because the\r\n`outputs` are an object with dynamic keys, we
can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\":
{\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n
}\r\n```\r\n\r\nDid we mean to just collect the types here, or any other
info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin
failures\r\nWe only have the most recent checkin status and timestamp
on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last
checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin
status currently.\r\nOr do we mean to publish specific info for all
agents\r\n(`last_checkin_status`, `last_checkin` time,
`last_checkin_message`)?\r\nAre the only statuses `error` and `degraded`
that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\":
{\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6.
Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean
here elastic-agent logs only, or fleet-server logs as well\r\n(maybe
separately)?\r\n\r\nI found an alternative way to query the message
field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET
logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n
\"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\":
\"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n
\"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n
\"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n
\"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n
\"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n
}\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample
response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n
\"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n
\"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline
agents\",\r\n \"regex\":
\".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n
\"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n
\"key\": \"\"\"stderr panic close of closed channel n ngoroutine running
Stop ngh.neting.cc/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5
\\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go
n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past
period of time\r\n\r\nI think this is almost the same as #5. The
difference would be to report\r\nnew failures happened only in the last
hour, or report all agents in\r\nfailure state. (which would be an
increasing number if the agent stays\r\nin failed state).\r\nDo we want
these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr
query, instead added a new field to report\r\nagents enrolled per policy
(top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\":
{\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n
\"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of
Elastic Agent and number of fleet server\r\n\r\nThis is already there in
the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\":
0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n
\"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n
\"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n
\"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n
\"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n
},\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"8.6","label":"v8.6.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145353","number":145353,"mergeCommit":{"message":"Fleet
Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses
https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet
to the telemetry that I added for each requirement.\r\nPlease review and
let me know if any changes are needed.\r\nAlso asked a few questions
below. @jlind23 @kpollich \r\n\r\n6. is blocked by
[elasticsearch\r\nchange](elastic/elasticsearch#91701)
to give\r\nkibana_system the missing privilege to read
logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning
from\r\nhttps://github.com//pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n-
[x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent
running: `agent.version` field on\r\n`.fleet-agents`
documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n
],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can
query for `.fleet-policies` where some `input` has
`type:\r\n'fleet-server'` for this, as well as use the `Fleet Server
Hosts`\r\nsettings that we define via saved objects in
Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\":
[\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n
\"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\":
\"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number
of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did
we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n
\"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those
policies\r\nCollecting this from ts logic, querying from
`.fleet-policies` index.\r\nThe alternative would be to write a painless
script (because the\r\n`outputs` are an object with dynamic keys, we
can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\":
{\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n
}\r\n```\r\n\r\nDid we mean to just collect the types here, or any other
info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin
failures\r\nWe only have the most recent checkin status and timestamp
on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last
checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin
status currently.\r\nOr do we mean to publish specific info for all
agents\r\n(`last_checkin_status`, `last_checkin` time,
`last_checkin_message`)?\r\nAre the only statuses `error` and `degraded`
that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\":
{\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6.
Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean
here elastic-agent logs only, or fleet-server logs as well\r\n(maybe
separately)?\r\n\r\nI found an alternative way to query the message
field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET
logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n
\"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\":
\"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n
\"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n
\"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n
\"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n
\"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n
}\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample
response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n
\"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n
\"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline
agents\",\r\n \"regex\":
\".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n
\"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n
\"key\": \"\"\"stderr panic close of closed channel n ngoroutine running
Stop ngh.neting.cc/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5
\\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go
n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past
period of time\r\n\r\nI think this is almost the same as #5. The
difference would be to report\r\nnew failures happened only in the last
hour, or report all agents in\r\nfailure state. (which would be an
increasing number if the agent stays\r\nin failed state).\r\nDo we want
these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr
query, instead added a new field to report\r\nagents enrolled per policy
(top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\":
{\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n
\"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of
Elastic Agent and number of fleet server\r\n\r\nThis is already there in
the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\":
0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n
\"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n
\"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n
\"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n
\"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n
},\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\nCo-authored-by: Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9"}}]}]
BACKPORT-->

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
  • Loading branch information
kibanamachine and juliaElastic authored Nov 23, 2022
1 parent b6907b8 commit 7b99f4c
Show file tree
Hide file tree
Showing 10 changed files with 783 additions and 204 deletions.
85 changes: 83 additions & 2 deletions x-pack/plugins/fleet/server/collectors/agent_collectors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@

import type { SavedObjectsClient, ElasticsearchClient } from '@kbn/core/server';

import type { FleetConfigType } from '../../common/types';
import { AGENTS_INDEX } from '../../common';
import * as AgentService from '../services/agents';
import { appContextService } from '../services';

export interface AgentUsage {
total_enrolled: number;
Expand All @@ -20,7 +21,6 @@ export interface AgentUsage {
}

export const getAgentUsage = async (
config: FleetConfigType,
soClient?: SavedObjectsClient,
esClient?: ElasticsearchClient
): Promise<AgentUsage> => {
Expand All @@ -47,3 +47,84 @@ export const getAgentUsage = async (
updating,
};
};

export interface AgentData {
agent_versions: string[];
agent_checkin_status: {
error: number;
degraded: number;
};
agents_per_policy: number[];
}

const DEFAULT_AGENT_DATA = {
agent_versions: [],
agent_checkin_status: { error: 0, degraded: 0 },
agents_per_policy: [],
};

export const getAgentData = async (
esClient: ElasticsearchClient,
abortController: AbortController
): Promise<AgentData> => {
try {
const transformLastCheckinStatusBuckets = (resp: any) =>
((resp?.aggregations?.last_checkin_status as any).buckets ?? []).reduce(
(acc: any, bucket: any) => {
if (acc[bucket.key] !== undefined) acc[bucket.key] = bucket.doc_count;
return acc;
},
{ error: 0, degraded: 0 }
);
const response = await esClient.search(
{
index: AGENTS_INDEX,
query: {
bool: {
filter: [
{
term: {
active: 'true',
},
},
],
},
},
size: 0,
aggs: {
versions: {
terms: { field: 'agent.version' },
},
last_checkin_status: {
terms: { field: 'last_checkin_status' },
},
policies: {
terms: { field: 'policy_id' },
},
},
},
{ signal: abortController.signal }
);
const versions = ((response?.aggregations?.versions as any).buckets ?? []).map(
(bucket: any) => bucket.key
);
const statuses = transformLastCheckinStatusBuckets(response);

const agentsPerPolicy = ((response?.aggregations?.policies as any).buckets ?? []).map(
(bucket: any) => bucket.doc_count
);

return {
agent_versions: versions,
agent_checkin_status: statuses,
agents_per_policy: agentsPerPolicy,
};
} catch (error) {
if (error.statusCode === 404) {
appContextService.getLogger().debug('Index .fleet-agents does not exist yet.');
} else {
throw error;
}
return DEFAULT_AGENT_DATA;
}
};
61 changes: 61 additions & 0 deletions x-pack/plugins/fleet/server/collectors/agent_policies.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

import type { ElasticsearchClient } from '@kbn/core/server';

import { AGENT_POLICY_INDEX } from '../../common';
import { ES_SEARCH_LIMIT } from '../../common/constants';
import { appContextService } from '../services';

export interface AgentPoliciesUsage {
count: number;
output_types: string[];
}

const DEFAULT_AGENT_POLICIES_USAGE = {
count: 0,
output_types: [],
};

export const getAgentPoliciesUsage = async (
esClient: ElasticsearchClient,
abortController: AbortController
): Promise<AgentPoliciesUsage> => {
try {
const res = await esClient.search(
{
index: AGENT_POLICY_INDEX,
size: ES_SEARCH_LIMIT,
track_total_hits: true,
rest_total_hits_as_int: true,
},
{ signal: abortController.signal }
);

const agentPolicies = res.hits.hits;

const outputTypes = new Set<string>();
agentPolicies.forEach((item) => {
const source = (item._source as any) ?? {};
Object.keys(source.data.outputs).forEach((output) => {
outputTypes.add(source.data.outputs[output].type);
});
});

return {
count: res.hits.total as number,
output_types: Array.from(outputTypes),
};
} catch (error) {
if (error.statusCode === 404) {
appContextService.getLogger().debug('Index .fleet-policies does not exist yet.');
} else {
throw error;
}
return DEFAULT_AGENT_POLICIES_USAGE;
}
};
46 changes: 46 additions & 0 deletions x-pack/plugins/fleet/server/collectors/fleet_server_collector.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

import type { SavedObjectsClient, ElasticsearchClient } from '@kbn/core/server';

import { PACKAGE_POLICY_SAVED_OBJECT_TYPE, SO_SEARCH_LIMIT } from '../constants';

import { packagePolicyService } from '../services';
import { getAgentStatusForAgentPolicy } from '../services/agents';
import { listFleetServerHosts } from '../services/fleet_server_host';
Expand Down Expand Up @@ -84,3 +86,47 @@ export const getFleetServerUsage = async (
num_host_urls: numHostsUrls,
};
};

export const getFleetServerConfig = async (soClient: SavedObjectsClient): Promise<any> => {
const res = await packagePolicyService.list(soClient, {
page: 1,
perPage: SO_SEARCH_LIMIT,
kuery: `${PACKAGE_POLICY_SAVED_OBJECT_TYPE}.package.name:fleet_server`,
});
const getInputConfig = (item: any) => {
const config = (item.inputs[0] ?? {}).compiled_input;
if (config?.server) {
// whitelist only server limits, timeouts and runtime, sometimes fields are coming in "server.limits" format instead of nested object
const newConfig = Object.keys(config)
.filter((key) => key.startsWith('server'))
.reduce((acc: any, curr: string) => {
if (curr === 'server') {
acc.server = {};
Object.keys(config.server)
.filter(
(key) =>
key.startsWith('limits') ||
key.startsWith('timeouts') ||
key.startsWith('runtime')
)
.forEach((serverKey: string) => {
acc.server[serverKey] = config.server[serverKey];
return acc;
});
} else {
acc[curr] = config[curr];
}
return acc;
}, {});

return newConfig;
} else {
return {};
}
};
const policies = res.items.map((item) => ({
input_config: getInputConfig(item),
}));

return { policies };
};
33 changes: 28 additions & 5 deletions x-pack/plugins/fleet/server/collectors/register.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,14 @@ import type { CoreSetup } from '@kbn/core/server';
import type { FleetConfigType } from '..';

import { getIsAgentsEnabled } from './config_collectors';
import { getAgentUsage } from './agent_collectors';
import { getAgentUsage, getAgentData } from './agent_collectors';
import type { AgentUsage } from './agent_collectors';
import { getInternalClients } from './helpers';
import { getPackageUsage } from './package_collectors';
import type { PackageUsage } from './package_collectors';
import { getFleetServerUsage } from './fleet_server_collector';
import { getFleetServerUsage, getFleetServerConfig } from './fleet_server_collector';
import type { FleetServerUsage } from './fleet_server_collector';
import { getAgentPoliciesUsage } from './agent_policies';

export interface Usage {
agents_enabled: boolean;
Expand All @@ -26,11 +27,33 @@ export interface Usage {
fleet_server: FleetServerUsage;
}

export const fetchUsage = async (core: CoreSetup, config: FleetConfigType) => {
export const fetchFleetUsage = async (
core: CoreSetup,
config: FleetConfigType,
abortController: AbortController
) => {
const [soClient, esClient] = await getInternalClients(core);
if (!soClient || !esClient) {
return;
}
const usage = {
agents_enabled: getIsAgentsEnabled(config),
agents: await getAgentUsage(soClient, esClient),
fleet_server: await getFleetServerUsage(soClient, esClient),
packages: await getPackageUsage(soClient),
...(await getAgentData(esClient, abortController)),
fleet_server_config: await getFleetServerConfig(soClient),
agent_policies: await getAgentPoliciesUsage(esClient, abortController),
};
return usage;
};

// used by kibana daily collector
const fetchUsage = async (core: CoreSetup, config: FleetConfigType) => {
const [soClient, esClient] = await getInternalClients(core);
const usage = {
agents_enabled: getIsAgentsEnabled(config),
agents: await getAgentUsage(config, soClient, esClient),
agents: await getAgentUsage(soClient, esClient),
fleet_server: await getFleetServerUsage(soClient, esClient),
packages: await getPackageUsage(soClient),
};
Expand All @@ -41,7 +64,7 @@ export const fetchAgentsUsage = async (core: CoreSetup, config: FleetConfigType)
const [soClient, esClient] = await getInternalClients(core);
const usage = {
agents_enabled: getIsAgentsEnabled(config),
agents: await getAgentUsage(config, soClient, esClient),
agents: await getAgentUsage(soClient, esClient),
fleet_server: await getFleetServerUsage(soClient, esClient),
};
return usage;
Expand Down
Loading

0 comments on commit 7b99f4c

Please sign in to comment.