-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Infra] infra services api #173875
[Infra] infra services api #173875
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
4279082
to
d7fd655
Compare
…eptunian/kibana into 171661-infra-services-endpoint
…d custom buildRouteValidationWithExcess
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services) |
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
import type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds route validation that can validate against excess props. I think this is useful when passing in unsupported query filters instead of ignoring them which io-ts runtime type validation will do. A lot of this was copied from a couple other plugins which seem to have copied from each other. i had to adjust it to handle the ExactType. Eventually I might add this to the kbn utils package.
const result = await client<{}, ServicesAPIQueryAggregation>({ | ||
body, | ||
index: [transaction, error, metric], | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaict this is querying both transaction samples and transaction metrics. There could very well be billions of transaction samples for the past day on even clusters of modest size and this will therefore quickly run into scaling issues.
I suggest using service transaction metrics (and the appropriate interval) where possible, and only using transaction samples as a fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea: The fastest way to get all service names would be the terms enum api. That comes with some big limitation compared to the normal Elasticsearch DSL. For instance, you won't be able to get the agent.name
per service. It might still be faster to get the service names via terms enum api, then fetching the agent names using a combination of bulk api and terminate_after: 1
... but at the end of the day, service transaction metrics probably provides a better balance between perf and DX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sqren good point on the scaling issues. But don't you think that APM should be responsible for determining the appropriate interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sqren good point on the scaling issues. But don't you think that APM should be responsible for determining the appropriate interval?
Maybe I'm missing something but this doesn't call any APM api's, does it? If this indeed did call the APM services API, then yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sqren Thanks, oversight on my part. Service transaction metrics don't collect the host name. Ideally as Dario suggested I'd like to only have to query the service_summary
metrics, but it's not collecting host.name either. If it could I think that would really simplify things and we could avoid querying anything else. Is this something I could request from the APM Server team? In lieu of that, I'll avoid querying the transaction samples and focus on transaction metrics and logs. I've separated the queries out to target the transaction
metricset. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Querying transaction metrics sounds good for now. Just note that the plan is to remove host information from the transaction metrics, and instead have instance specific metrics. This will probably not happen anytime soon but when it does, this needs to be changed.
Related:
validate: { | ||
query: (q, res) => { | ||
const [invalidResponse, parsedFilters] = validateStringAssetFilters(q, res); | ||
if (invalidResponse) { | ||
return invalidResponse; | ||
} | ||
if (parsedFilters) { | ||
q.validatedFilters = parsedFilters; | ||
} | ||
return validate(q, res); | ||
}, | ||
}, | ||
}, | ||
async (requestContext, request, response) => { | ||
const [{ savedObjects }] = await libs.getStartServices(); | ||
const { from, to, size = 10, validatedFilters } = request.query; | ||
|
||
try { | ||
if (!validatedFilters) { | ||
throw Boom.badRequest('Invalid filters'); | ||
} | ||
const client = createSearchClient(requestContext, framework, request); | ||
const soClient = savedObjects.getScopedClient(request); | ||
const apmIndices = await libs.getApmIndices(soClient); | ||
const services = await getServices(client, apmIndices, { | ||
from, | ||
to, | ||
size, | ||
filters: validatedFilters, | ||
}); | ||
return response.ok({ | ||
body: ServicesAPIResponseRT.encode(services), | ||
}); | ||
} catch (err) { | ||
if (Boom.isBoom(err)) { | ||
return response.customError({ | ||
statusCode: err.output.statusCode, | ||
body: { message: err.output.payload.message }, | ||
}); | ||
} | ||
|
||
return response.customError({ | ||
statusCode: err.statusCode ?? 500, | ||
body: { | ||
message: err.message ?? 'An unexpected error occurred', | ||
}, | ||
}); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of boilerplate here and it's hard to see what it has to do with this route. Mostly the validation and error handling looks very generic. Shouldn't this be handled by the framework?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this up. When trying to validate a strict type of allowed filters it made things a bit more complicated. TS can't infer that validatedFilters exist which is a different type than the filters param which is a string. Since it definitely does exist or it would fail in validateStringAssetFilters, I've used a type assertion.
}) | ||
.expect(200); | ||
|
||
const { services } = decodeOrThrow(ServicesAPIResponseRT)(response.body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion: it's VERY useful to have a typed api client. For apm we have this which makes it possible to call REST apis and get typed responses back - no custom parsing or explicit type annotations needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…eptunian/kibana into 171661-infra-services-endpoint
…eptunian/kibana into 171661-infra-services-endpoint
| rt.InterfaceType<rt.Props> | ||
| GenericIntersectionC | ||
| rt.PartialType<rt.Props> | ||
| rt.ExactC<any>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support the Exact type directly
import { RouteValidationError, RouteValidationResultFactory } from '@kbn/core/server'; | ||
|
||
type ValidateStringAssetFiltersReturn = [{ error: RouteValidationError }] | [null, any]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this validation function makes sure the filters exist on the request and parses them, then we can continue validation of the filter object shape in the type validation
apmSynthtraceEsClient: (context: InheritedFtrProviderContext) => Promise<ApmSynthtraceEsClient>; | ||
}; | ||
} | ||
export default async function createTestConfig({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the synthtrace client as a service to our test config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💚 Build Succeeded
Metrics [docs]Module Count
Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: |
## Summary Creation of a new endpoint within Infra to get services from APM indices that are related to a give host through `host.name`. These services will be listed in the Host Detail view in another PR. This endpoint queries apm transaction metrics and apm logs to get services. Closes elastic#171661 ### Test The easiest way to test this api is to visit it directly using a host that has some services attached to it using our test cluster URL: http://localhost:5601/api/infra/services eg usage: `http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5` response: ``` { "services": [ { "service.name": "productcatalogservice", "agent.name": "opentelemetry/go" }, { "service.name": "frontend", "agent.name": "opentelemetry/nodejs" } ] } ``` ### Follow up - Have APM server collect host.name as part of service_summary metrics and query that instead. Service summary aggregates transaction, error, log, and metric events into service-summary metrics. This would simplify the query. - `added apm-synthtrace` to `metrics_ui` api tests and created follow up PR for removing the code i needed to duplicate elastic#175064 --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Creation of a new endpoint within Infra to get services from APM indices that are related to a give host through `host.name`. These services will be listed in the Host Detail view in another PR. This endpoint queries apm transaction metrics and apm logs to get services. Closes elastic#171661 ### Test The easiest way to test this api is to visit it directly using a host that has some services attached to it using our test cluster URL: http://localhost:5601/api/infra/services eg usage: `http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5` response: ``` { "services": [ { "service.name": "productcatalogservice", "agent.name": "opentelemetry/go" }, { "service.name": "frontend", "agent.name": "opentelemetry/nodejs" } ] } ``` ### Follow up - Have APM server collect host.name as part of service_summary metrics and query that instead. Service summary aggregates transaction, error, log, and metric events into service-summary metrics. This would simplify the query. - `added apm-synthtrace` to `metrics_ui` api tests and created follow up PR for removing the code i needed to duplicate elastic#175064 --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
## Summary Creation of a new endpoint within Infra to get services from APM indices that are related to a give host through `host.name`. These services will be listed in the Host Detail view in another PR. This endpoint queries apm transaction metrics and apm logs to get services. Closes elastic#171661 ### Test The easiest way to test this api is to visit it directly using a host that has some services attached to it using our test cluster URL: http://localhost:5601/api/infra/services eg usage: `http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5` response: ``` { "services": [ { "service.name": "productcatalogservice", "agent.name": "opentelemetry/go" }, { "service.name": "frontend", "agent.name": "opentelemetry/nodejs" } ] } ``` ### Follow up - Have APM server collect host.name as part of service_summary metrics and query that instead. Service summary aggregates transaction, error, log, and metric events into service-summary metrics. This would simplify the query. - `added apm-synthtrace` to `metrics_ui` api tests and created follow up PR for removing the code i needed to duplicate elastic#175064 --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
Creation of a new endpoint within Infra to get services from APM indices that are related to a give host through
host.name
. These services will be listed in the Host Detail view in another PR. This endpoint queries apm transaction metrics and apm logs to get services.Closes #171661
Test
The easiest way to test this api is to visit it directly using a host that has some services attached to it using our test cluster
URL: http://localhost:5601/api/infra/services
eg usage:
http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5
response:
Follow up
Have APM server collect host.name as part of service_summary metrics and query that instead. Service summary aggregates transaction, error, log, and metric events into service-summary metrics. This would simplify the query.
added apm-synthtrace
tometrics_ui
api tests and created follow up PR for removing the code i needed to duplicate [Infra] Make apm synthtrace kibana client a service available to functional and integration tests #175064