Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra] infra services api #173875

Merged
merged 37 commits into from
Feb 5, 2024

Conversation

neptunian
Copy link
Contributor

@neptunian neptunian commented Dec 21, 2023

Summary

Creation of a new endpoint within Infra to get services from APM indices that are related to a give host through host.name. These services will be listed in the Host Detail view in another PR. This endpoint queries apm transaction metrics and apm logs to get services.

Closes #171661

Test

The easiest way to test this api is to visit it directly using a host that has some services attached to it using our test cluster

URL: http://localhost:5601/api/infra/services
eg usage: http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5

response:

{
    "services": [
        {
            "service.name": "productcatalogservice",
            "agent.name": "opentelemetry/go"
        },
        {
            "service.name": "frontend",
            "agent.name": "opentelemetry/nodejs"
        }
    ]
}

Follow up

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@neptunian neptunian force-pushed the 171661-infra-services-endpoint branch from 4279082 to d7fd655 Compare January 2, 2024 20:23
@neptunian neptunian marked this pull request as ready for review January 18, 2024 20:59
@neptunian neptunian requested review from a team as code owners January 18, 2024 20:59
@neptunian neptunian added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Jan 18, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@neptunian neptunian added the release_note:skip Skip the PR/issue when compiling release notes label Jan 18, 2024
@neptunian neptunian changed the title [Obs UX] infra services api [Infra] infra services api Jan 18, 2024
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/
import type {
Copy link
Contributor Author

@neptunian neptunian Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds route validation that can validate against excess props. I think this is useful when passing in unsupported query filters instead of ignoring them which io-ts runtime type validation will do. A lot of this was copied from a couple other plugins which seem to have copied from each other. i had to adjust it to handle the ExactType. Eventually I might add this to the kbn utils package.

Comment on lines 76 to 79
const result = await client<{}, ServicesAPIQueryAggregation>({
body,
index: [transaction, error, metric],
});
Copy link
Member

@sorenlouv sorenlouv Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaict this is querying both transaction samples and transaction metrics. There could very well be billions of transaction samples for the past day on even clusters of modest size and this will therefore quickly run into scaling issues.

I suggest using service transaction metrics (and the appropriate interval) where possible, and only using transaction samples as a fallback.

Copy link
Member

@sorenlouv sorenlouv Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea: The fastest way to get all service names would be the terms enum api. That comes with some big limitation compared to the normal Elasticsearch DSL. For instance, you won't be able to get the agent.name per service. It might still be faster to get the service names via terms enum api, then fetching the agent names using a combination of bulk api and terminate_after: 1

... but at the end of the day, service transaction metrics probably provides a better balance between perf and DX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqren good point on the scaling issues. But don't you think that APM should be responsible for determining the appropriate interval?

Copy link
Member

@sorenlouv sorenlouv Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqren good point on the scaling issues. But don't you think that APM should be responsible for determining the appropriate interval?

Maybe I'm missing something but this doesn't call any APM api's, does it? If this indeed did call the APM services API, then yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sqren Thanks, oversight on my part. Service transaction metrics don't collect the host name. Ideally as Dario suggested I'd like to only have to query the service_summary metrics, but it's not collecting host.name either. If it could I think that would really simplify things and we could avoid querying anything else. Is this something I could request from the APM Server team? In lieu of that, I'll avoid querying the transaction samples and focus on transaction metrics and logs. I've separated the queries out to target the transaction metricset. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying transaction metrics sounds good for now. Just note that the plan is to remove host information from the transaction metrics, and instead have instance specific metrics. This will probably not happen anytime soon but when it does, this needs to be changed.

Related:

Comment on lines 28 to 76
validate: {
query: (q, res) => {
const [invalidResponse, parsedFilters] = validateStringAssetFilters(q, res);
if (invalidResponse) {
return invalidResponse;
}
if (parsedFilters) {
q.validatedFilters = parsedFilters;
}
return validate(q, res);
},
},
},
async (requestContext, request, response) => {
const [{ savedObjects }] = await libs.getStartServices();
const { from, to, size = 10, validatedFilters } = request.query;

try {
if (!validatedFilters) {
throw Boom.badRequest('Invalid filters');
}
const client = createSearchClient(requestContext, framework, request);
const soClient = savedObjects.getScopedClient(request);
const apmIndices = await libs.getApmIndices(soClient);
const services = await getServices(client, apmIndices, {
from,
to,
size,
filters: validatedFilters,
});
return response.ok({
body: ServicesAPIResponseRT.encode(services),
});
} catch (err) {
if (Boom.isBoom(err)) {
return response.customError({
statusCode: err.output.statusCode,
body: { message: err.output.payload.message },
});
}

return response.customError({
statusCode: err.statusCode ?? 500,
body: {
message: err.message ?? 'An unexpected error occurred',
},
});
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of boilerplate here and it's hard to see what it has to do with this route. Mostly the validation and error handling looks very generic. Shouldn't this be handled by the framework?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up. When trying to validate a strict type of allowed filters it made things a bit more complicated. TS can't infer that validatedFilters exist which is a different type than the filters param which is a string. Since it definitely does exist or it would fail in validateStringAssetFilters, I've used a type assertion.

})
.expect(200);

const { services } = decodeOrThrow(ServicesAPIResponseRT)(response.body);
Copy link
Member

@sorenlouv sorenlouv Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion: it's VERY useful to have a typed api client. For apm we have this which makes it possible to call REST apis and get typed responses back - no custom parsing or explicit type annotations needed

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| rt.InterfaceType<rt.Props>
| GenericIntersectionC
| rt.PartialType<rt.Props>
| rt.ExactC<any>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support the Exact type directly

@neptunian neptunian requested a review from a team as a code owner January 31, 2024 18:10
@neptunian neptunian requested review from pzl and tomsonpl January 31, 2024 18:10
@neptunian
Copy link
Contributor Author

@pzl @tomsonpl I haven't seen these autocommits before. Is this a new thing and something to ignore? I didn't modify any files in the osquery plugin which seems to be what triggered it.

import { RouteValidationError, RouteValidationResultFactory } from '@kbn/core/server';

type ValidateStringAssetFiltersReturn = [{ error: RouteValidationError }] | [null, any];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this validation function makes sure the filters exist on the request and parses them, then we can continue validation of the filter object shape in the type validation

apmSynthtraceEsClient: (context: InheritedFtrProviderContext) => Promise<ApmSynthtraceEsClient>;
};
}
export default async function createTestConfig({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the synthtrace client as a service to our test config

Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
infra 1419 1420 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
infra 1.3MB 1.3MB +2.1KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
infra 99.9KB 99.9KB +60.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@neptunian neptunian merged commit 6fc6950 into elastic:main Feb 5, 2024
17 checks passed
@kibanamachine kibanamachine added v8.13.0 backport:skip This commit does not require backporting labels Feb 5, 2024
fkanout pushed a commit to fkanout/kibana that referenced this pull request Feb 7, 2024
## Summary
Creation of a new endpoint within Infra to get services from APM indices
that are related to a give host through `host.name`. These services will
be listed in the Host Detail view in another PR. This endpoint queries
apm transaction metrics and apm logs to get services.

Closes elastic#171661

### Test
The easiest way to test this api is to visit it directly using a host
that has some services attached to it using our test cluster

URL: http://localhost:5601/api/infra/services
eg usage:
`http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5`

response:

```
{
    "services": [
        {
            "service.name": "productcatalogservice",
            "agent.name": "opentelemetry/go"
        },
        {
            "service.name": "frontend",
            "agent.name": "opentelemetry/nodejs"
        }
    ]
}
```



### Follow up 
- Have APM server collect host.name as part of service_summary metrics
and query that instead. Service summary aggregates transaction, error,
log, and metric events into service-summary metrics. This would simplify
the query.

- `added apm-synthtrace` to `metrics_ui` api tests and created follow up
PR for removing the code i needed to duplicate
elastic#175064

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary
Creation of a new endpoint within Infra to get services from APM indices
that are related to a give host through `host.name`. These services will
be listed in the Host Detail view in another PR. This endpoint queries
apm transaction metrics and apm logs to get services.

Closes elastic#171661

### Test
The easiest way to test this api is to visit it directly using a host
that has some services attached to it using our test cluster

URL: http://localhost:5601/api/infra/services
eg usage:
`http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5`

response:

```
{
    "services": [
        {
            "service.name": "productcatalogservice",
            "agent.name": "opentelemetry/go"
        },
        {
            "service.name": "frontend",
            "agent.name": "opentelemetry/nodejs"
        }
    ]
}
```



### Follow up 
- Have APM server collect host.name as part of service_summary metrics
and query that instead. Service summary aggregates transaction, error,
log, and metric events into service-summary metrics. This would simplify
the query.

- `added apm-synthtrace` to `metrics_ui` api tests and created follow up
PR for removing the code i needed to duplicate
elastic#175064

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
fkanout pushed a commit to fkanout/kibana that referenced this pull request Mar 4, 2024
## Summary
Creation of a new endpoint within Infra to get services from APM indices
that are related to a give host through `host.name`. These services will
be listed in the Host Detail view in another PR. This endpoint queries
apm transaction metrics and apm logs to get services.

Closes elastic#171661

### Test
The easiest way to test this api is to visit it directly using a host
that has some services attached to it using our test cluster

URL: http://localhost:5601/api/infra/services
eg usage:
`http://localhost:5601/api/infra/services?from=now-15m&to=now&filters={"host.name":"gke-edge-oblt-edge-oblt-pool-5fbec7a6-nfy0"}&size=5`

response:

```
{
    "services": [
        {
            "service.name": "productcatalogservice",
            "agent.name": "opentelemetry/go"
        },
        {
            "service.name": "frontend",
            "agent.name": "opentelemetry/nodejs"
        }
    ]
}
```



### Follow up 
- Have APM server collect host.name as part of service_summary metrics
and query that instead. Service summary aggregates transaction, error,
log, and metric events into service-summary metrics. This would simplify
the query.

- `added apm-synthtrace` to `metrics_ui` api tests and created follow up
PR for removing the code i needed to duplicate
elastic#175064

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Obs UX] Create query and endpoint for accessing services