-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Additional data telemetry changes #71112
Conversation
Pinging @elastic/apm-ui (Team:apm) |
@smith I'm a bit worried about 10s. I expect some of the tasks to take significantly longer for our bigger users. We are also storing |
There were a few that were over 350s, and you were right, many in the 30s range. I'm going to set it to 5m. |
* Add a date range of `now-1d` to the cloud query * Add a timeout of 5m to all queries (we'll investigate using async queries to improve this in the future.) * Factor out the date range filter into a variable * Fix a bug with the `indices_stats` tasks when it doesn't return data * Update the merge mapping script to create a migration file
x-pack/plugins/apm/server/lib/apm_telemetry/collect_data_telemetry/tasks.ts
Show resolved
Hide resolved
x-pack/plugins/apm/server/lib/apm_telemetry/collect_data_telemetry/tasks.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
@smith can you add telemetry for aggregated transactions? I'd like to record the number of metric documents we can expect with the current implementation. We can simulate this by requesting data for 1m, adding a composite aggregation on a number of fields and counting the number of buckets. The resulting count will be an approximation of the amount of metric documents that will be created. We should record both the expected metric document count plus the transaction count for that time range. I'd like to record several combinations:
I'd also like to record the cardinality of For an example of how to create the composite aggregation and paginate through the buckets, see |
@dgieselaar I should be able to add all of that stuff on Monday but I'll do it in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled the code and ran the telemetry checker on the collector schema
. Unfortunately, I'm getting type errors from the typescript parser that I can't seem to work around. I've asked the team to pitch in and help get this over the finish line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some NITs re the schema
property 👍
@@ -55,15 +55,9 @@ The mapping for the telemetry data is here under `stack_stats.kibana.plugins.apm | |||
|
|||
The mapping used there can be generated with the output of the [`getTelemetryMapping`](../common/apm_telemetry.ts) function. | |||
|
|||
To make a change to the mapping, edit this function, run the tests to update the snapshots, then use the `merge_telemetry_mapping` script to merge the data into the telemetry repository. | |||
The `schema` property of the `makeUsageCollector` call in the [`createApmTelemetry` function](../server/lib/apm_telemetry/index.ts) contains the output of `getTelemetryMapping`. Pull requests with changes to the schema should automatically notify the Telemetry team so they can update the mapping in the telemetry clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: In order for the Kibana Telemetry team to be notified, devs need to run node scripts/telemetry_check.js --fix
. It will update the file containing all the schemas (and owned by the Telemetry team).
At the moment, APM is excluded so it doesn't fail (
kibana/x-pack/.telemetryrc.json
Line 7 in 637a0d9
"plugins/apm/server/lib/apm_telemetry/index.ts", |
I tried with removing the exclusion but it fails.
For progress, I'd say we merge this PR and tackle this issue separately 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the reason it's failing is because file with the usage collector call is parsed directly so the whole schema has to be there as a literal, whereas we're calling a function schema: getApmTelemetryMapping()
, which it fails to parse as a valid schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smith there are two other issues I've found thus far while trying to resolve the telemetry-check
failure:
- The schema needs to be what's displayed in the doc (as types, not actual values) and that the usage collector needs to be explicitly typed.
Here's the schema I got from your mapping
export const apmTelemetrySchema = {
agents: {
dotnet: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
go: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
java: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
'js-base': {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
nodejs: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
python: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
ruby: {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
'rum-js': {
agent: {
version: { type: 'keyword' },
},
service: {
framework: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
language: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
runtime: {
composite: { type: 'keyword' },
name: { type: 'keyword' },
version: { type: 'keyword' },
},
},
},
},
cloud: {
availability_zone: { type: 'keyword' },
provider: { type: 'keyword' },
region: { type: 'keyword' },
},
counts: {
agent_configuration: {
all: { type: 'long' },
},
error: {
'1d': { type: 'long' },
all: { type: 'long' },
},
max_error_groups_per_service: {
'1d': { type: 'long' },
},
max_transaction_groups_per_service: {
'1d': { type: 'long' },
},
metric: {
'1d': { type: 'long' },
all: { type: 'long' },
},
onboarding: {
'1d': { type: 'long' },
all: { type: 'long' },
},
services: {
'1d': { type: 'long' },
},
sourcemap: {
'1d': { type: 'long' },
all: { type: 'long' },
},
span: {
'1d': { type: 'long' },
all: { type: 'long' },
},
traces: {
'1d': { type: 'long' },
},
transaction: {
'1d': { type: 'long' },
all: { type: 'long' },
},
},
cardinality: {
user_agent: {
original: {
all_agents: {
'1d': { type: 'long' },
},
rum: {
'1d': { type: 'long' },
},
},
},
transaction: {
name: {
all_agents: {
'1d': { type: 'long' },
},
rum: {
'1d': { type: 'long' },
},
},
},
},
has_any_services: {
type: 'boolean',
},
indices: {
all: {
total: {
docs: { count: { type: 'long' } },
store: { size_in_bytes: { type: 'long' } },
},
},
shards: { total: { type: 'long' } },
},
integrations: {
ml: { all_jobs_count: { type: 'long' } },
},
retainment: {
error: { ms: { type: 'long' } },
metric: { ms: { type: 'long' } },
onboarding: { ms: { type: 'long' } },
span: { ms: { type: 'long' } },
transaction: { ms: { type: 'long' } },
},
services_per_agent: {
dotnet: { type: 'long' },
go: { type: 'long' },
java: { type: 'long' },
'js-base': { type: 'long' },
nodejs: { type: 'long' },
python: { type: 'long' },
ruby: { type: 'long' },
'rum-js': { type: 'long' },
},
tasks: {
agent_configuration: { took: { ms: { type: 'long' } } },
agents: { took: { ms: { type: 'long' } } },
cardinality: { took: { ms: { type: 'long' } } },
cloud: { took: { ms: { type: 'long' } } },
groupings: { took: { ms: { type: 'long' } } },
indices_stats: { took: { ms: { type: 'long' } } },
integrations: { took: { ms: { type: 'long' } } },
processor_events: { took: { ms: { type: 'long' } } },
services: { took: { ms: { type: 'long' } } },
versions: { took: { ms: { type: 'long' } } },
},
version: {
apm_server: {
major: { type: 'long' },
minor: { type: 'long' },
patch: { type: 'long' },
},
},
};
(note: there are a few improvements we also need to make to the allowed schema types, including allowing ignore_above
for keyword
fields)
2. The usageCollector is complaining of a missing type
, even though type: "apm"
is there. I think it's because the collector isn't typed explicitly but we'll figure all these issues out in #70180
@dgieselaar yes, but opened #71593 to do it in a separate PR. |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* Add a date range of `now-1d` to the cloud query * Add a timeout of 5m to all queries (we'll investigate using async queries to improve this in the future.) * Factor out the date range filter into a variable * Fix a bug with the `indices_stats` tasks when it doesn't return data * Update the merge mapping script to create a migration file
* Add a date range of `now-1d` to the cloud query * Add a timeout of 5m to all queries (we'll investigate using async queries to improve this in the future.) * Factor out the date range filter into a variable * Fix a bug with the `indices_stats` tasks when it doesn't return data * Update the merge mapping script to create a migration file
* master: (21 commits) [Maps] 7.9 design improvements (elastic#71563) [ML] Changing all calls to ML endpoints to use internal user (elastic#70487) [eventLog] prevent log writing when initialization fails (elastic#71339) [Observability] landing page always being displayed (elastic#71494) [IM] Address data stream copy feedback (elastic#71615) [Logs UI] Anomalies page dataset filtering (elastic#71110) [data.search.aggs] Remove `use_field_mapping` from top hits agg (elastic#71168) [ML] Anomaly swim lane embeddable navigation and filter actions (elastic#71082) Fixes typo in siem_cloudtrail job description (elastic#71569) Require granted API Keys to have a name (elastic#71623) Update getUsageForCollection (elastic#71609) Only fetch saved elements once (elastic#71310) [SecuritySolution][Resolver] Adding siem index and guarding process ancestry (elastic#71570) [APM] Additional data telemetry changes (elastic#71112) [Visualize] Fix export table for table export links (elastic#71249) [Search] Server side search API (elastic#70446) use inclusive language (elastic#71607) [Security Solution] Hide timeline footer when Resolver is open (elastic#71516) [Index template wizard] Remove shadow and use border for components panels (elastic#71606) [ML] Kibana API endpoint for histogram chart data (elastic#70976) ...
now-1d
to the cloud queryindices_stats
tasks when it doesn't return dataschema
property to the usage collectorclient.geo.country_iso_code