Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface new labels for uninstrumented services and systems #3543

Merged
merged 20 commits into from
Apr 30, 2024

Conversation

t00mas
Copy link
Contributor

@t00mas t00mas commented Apr 5, 2024

What this PR does:

For the servicegraphs processor:

  • Gates this feature behind the EnableExtraUninstrumentedServicesLabels flag in the Config
  • If enabled, adds new labels to traces_service_graph_request_*:
    • virtual_node with a value of client or server, so the un-instrumented service's node can be identified.
    • client|server_db|messaging_system with the corresponding resource attribute value, to surface and identify the service and the node.

Which issue(s) this PR fixes:
Fixes #3461

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@CLAassistant
Copy link

CLAassistant commented Apr 5, 2024

CLA assistant check
All committers have signed the CLA.

@@ -190,6 +190,17 @@ func (p *Processor) consume(resourceSpans []*v1_trace.ResourceSpans) (err error)
e.ServerService = dbName
e.ServerLatencySec = spanDurationSec(span)
}

if p.Cfg.EnableExtraUninstrumentedServicesLabels {
switch e.ConnectionType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mutually exclusive, but is it possible that a service can be both database and messaging system? eg Redis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is - because at this point it has been decided what the service is acting as, depending on the presence/absence of the db.name resource attribute

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few initial questions. other thoughts:

  • what occurs if there are conflicts between these new values and configured span dimensions?
  • what do you think about pooling edges in the store?

modules/generator/processor/servicegraphs/store/edge.go Outdated Show resolved Hide resolved
modules/generator/processor/servicegraphs/config.go Outdated Show resolved Hide resolved
@t00mas
Copy link
Contributor Author

t00mas commented Apr 12, 2024

a few initial questions. other thoughts:

  • what occurs if there are conflicts between these new values and configured span dimensions?
  • what do you think about pooling edges in the store?

Thanks for reviewing Joe.

I've added some testing to cover "conflicts" with manually added Dimensions.

Could you be more specific about the edge pooling? Happy to address that if you think there's an issue there. (Sorry, I'm just new to Tempo in general)

Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this a per-tenant override instead. If this feature is behind a global config parameter, it makes me think that we should just add it to the default generator without a flag. What's the use case for enabling or disabling it globally?

@t00mas
Copy link
Contributor Author

t00mas commented Apr 18, 2024

I wonder if we should make this a per-tenant override instead. If this feature is behind a global config parameter, it makes me think that we should just add it to the default generator without a flag. What's the use case for enabling or disabling it globally?

It actually makes more sense as a per-tenant override instead, yes.

I've also been discussing with Joe and I will:

  • Scope-down the PR just to add the new virtual_node label (the others can be added with a combination of Dimensions and Prefixes)
  • Pool edges in the store for performance reasons

Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes look good. Thank you for adding docs!

@t00mas
Copy link
Contributor Author

t00mas commented Apr 22, 2024

These are some benchmarks about Edge reuse with pooling:

goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/modules/generator/processor/servicegraphs
20_runs_bench.txt 20_runs_bench_with_pooling.txt
sec/op sec/op vs base
ServiceGraphs-11 21.33µ ± 0% 21.63µ ± 2% +1.41% (p=0.006 n=20)
B/op B/op vs base
ServiceGraphs-11 24.89Ki ± 0% 24.06Ki ± 0% -3.35% (p=0.000 n=20)
allocs/op allocs/op vs base
ServiceGraphs-11 450.0 ± 0% 443.0 ± 0% -1.56% (p=0.000 n=20)

There's probably better numbers to be had at scale, since these are limited to the testdata size.

@@ -86,8 +86,20 @@ type Processor struct {

func New(cfg Config, tenant string, registry registry.Registry, logger log.Logger) gen.Processor {
labels := []string{"client", "server", "connection_type"}

if cfg.EnableVirtualNodeLabel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we add this to the list of labels above? this will prevent us from having to add lines 96-102 which adds special handling for this value.

then we could add a new param to onComplete where we pass the string from onExpire

Copy link
Contributor Author

@t00mas t00mas Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely doable - however what you save in these lines to avoid the prefixes, is added back to the onComplete logic for slice allocation and label value assignment.

also, it'll make this new label behave as half-dimension half-edge-property. do you think that'd be ok?

can you also explain further what's the modification of the onComplete signature for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i see. i was trying to remove some of the state wrangling in the consume() method but you've already completely removed it.

agree that the tradeoffs mentioned:

what you save in this lines to avoid the prefixes, is added back to the onComplete logic for slice allocation and label value assignment.

are not worth it. let's keep it as implemented now.

also, it'll make this new label behave as half-dimension half-edge-property. do you think that'd be ok?

i'm not seeing an issue with that. do you have some concerns in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok agree, leaving it behaving as before, as a dimension.

I think it's better to "treat it" as if it's fully one thing or the other. What I mean by that, from a future reader standpoint, it'll be easier to follow logic if it's treated as a dimension or as a struct property (first-class label?)

modules/generator/processor/servicegraphs/servicegraphs.go Outdated Show resolved Hide resolved
modules/generator/processor/servicegraphs/store/edge.go Outdated Show resolved Hide resolved
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs update, feature addition, a perf improvement. Love it. Thank you!

modules/generator/processor/servicegraphs/config.go Outdated Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Outdated Show resolved Hide resolved
@@ -52,6 +52,8 @@ const (
metricRequestClientSeconds = "traces_service_graph_request_client_seconds"
)

const virtualNodeLabel = "__virtual_node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the __ prefix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked for this. Should we drop it? Just trying to avoid conflicts with actual span attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be OK with and without the __ prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In spanmetrics we do the opposite, if there's a collision with an instrinsic dimension we add __ (code). I don't feel too strongly about this. I prefer it that way, butI'm OK with what you decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - then to not raise any questions to future readers, and because it seems a collision is unlikely, I'll leave it as without the __ prefix.

It'd be good in the future to identify and sanitize in the same way as in spanmetrics, happy to work on that small improvement in a future PR if you need me.

modules/generator/processor/servicegraphs/store/edge.go Outdated Show resolved Hide resolved
Copy link
Member

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the PR, except that one comment. Nice work!

modules/generator/processor/servicegraphs/store/store.go Outdated Show resolved Hide resolved
t00mas and others added 2 commits April 25, 2024 10:11
The edge is not expired here, so it shouldn't be returned to the pool.

Co-authored-by: Mario <mariorvinas@gmail.com>
@joe-elliott joe-elliott merged commit 251bf5a into grafana:main Apr 30, 2024
15 checks passed
@t00mas t00mas deleted the service-labels branch May 15, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Label improvements for Service identification
7 participants