Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define HTTP attributes that MUST be provided at span creation time #1916

Merged

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Sep 9, 2021

Changes

This change clarifies which HTTP attributes should be provided before making sampling decision: all attributes that are required and available before span starts.

Optional attributes are not provided at start time to avoid the overhead of calculating and adding them

Note it actually goes against current spec recommendation (see #620 for discussion):

Whenever possible, users SHOULD set any already known attributes at span creation instead of calling SetAttribute later.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-creation

Related issues #1747

@lmolkova lmolkova requested review from a team September 9, 2021 16:56
@Oberon00
Copy link
Member

Note it actually goes against current spec recommendation:

Whenever possible, users SHOULD set any already known attributes at span creation instead of calling SetAttribute later.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-creation

And I would like to change that as well because it seems to be not very efficient

Please don't open new, mostly unrelated issues in the PR description 😃
I agree that there is a fundamental performance problem with the current sampling API, but this should be discussed in #620.

@Oberon00
Copy link
Member

Oberon00 commented Sep 10, 2021

Your change currently fails the semantic convention check. I see two possibilities:

  1. Update the markdown generation part of the semantic convention generator would need to be adapted to describe sampling_relevant see https://github.com/open-telemetry/build-tools/blob/main/semantic-conventions/src/opentelemetry/semconv/templating/markdown/__init__.py. Documentation that currently says "do not use" also needs to be updated (please don't add an additional table column though. we already have many tables that need horizontal scrolling).
  2. Manually write the "at creation time" restrictions outside the <!--semconv--> tags for now.

@Oberon00
Copy link
Member

I agree to the semantic content of this PR, but it needs to be changed to pass the semantic convention check.

@lmolkova lmolkova force-pushed the define-http-sampling-attributes branch from bde6c05 to 2671561 Compare September 13, 2021 18:22
@Oberon00 Oberon00 added area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory labels Sep 14, 2021
Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the checks, everything else LGTM

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 23, 2021
@lmolkova lmolkova closed this Sep 23, 2021
@lmolkova lmolkova reopened this Sep 23, 2021
@arminru arminru requested review from a team September 23, 2021 15:55
@arminru arminru removed the Stale label Sep 23, 2021
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change sneaks MUST in the text

@jmacd
Copy link
Contributor

jmacd commented Sep 23, 2021

This doesn't look like "sneaking" to me, it's just a grammar question.

@yurishkuro are you suggesting to change

"following attributes MUST be provided at span creation time (when provided at all)"

to something like:

"If they will be provided during the lifetime of a span, the following attributes MUST be provided at span creation time."

?

@yurishkuro
Copy link
Member

@jmacd the PR title says "should be provided", but the text says MUST, that's why I called it "sneaking in".

I would not object to SHOULD, I do object to MUST, I think it is an unreasonable requirement.

@lmolkova lmolkova changed the title Define HTTP attributes that should be provided at span creation time Define HTTP attributes that MUST be provided at span creation time Sep 23, 2021
@jmacd
Copy link
Contributor

jmacd commented Sep 23, 2021

"If condition is met, something MUST be." sounds equivalent to SHOULD, it's just more specific. Right?

@jmacd
Copy link
Contributor

jmacd commented Sep 23, 2021

If we want to optimize the sampling API, we should take it up in #620 and let this PR proceed. A lazy-attribute mechanism is how I'd do this, e.g.,

    sp:= tracer.start("name", ..., tracing.WithLazyAttribute("http.path", func() attribute.Value { return ... }))
    // ...
    sp.SetAttribute(attribute.LazyString("http.thing", func() attribute.Value { return ... }))
    if sp.IsRecording() {
        expensiveVar := expensiveMethod()
        calculated := doLogic(expensiveVar)
        sp.SetAttribute(attribute.String("http.expensive", calculated))
    }

#620 is saying we should be able to refactor the Sampler API to allow the Sampler to include this "shouldSample(name)" logic and selectively evaluate any of the lazy span attributes in order to make its decision. Most call-sites would be able to use the lazy attribute and avoid a call to sp.IsRecording(), if we had such a thing, like the example above.

@yurishkuro
Copy link
Member

The problem I have with SHOULD is that instrumentations just do not follow it.

So either they don't care about following the spec (MUST wouldn't change that), or they have reasons (MUST would break them without a path forward).

@lmolkova
Copy link
Contributor Author

So either they don't care about following the spec (MUST wouldn't change that), or they have reasons (MUST would break them without a path forward).

my assumptions about reasons (and I own Azure SDK http instrumentation, so know a bit about it):

  • historic implementations - follow old versions and owners not aware of sampling and creation time attribute requirements
  • not-optimal (optional attributes)
  • not-optimal (no early check)

Let me give you another reason for MUST (a bit beyond sampling): calculating metrics without a new instrumentation code. Is it worth a MUST assuming #620 solves perf concerns (however it does it)?

@yurishkuro
Copy link
Member

Is it worth a MUST assuming #620 solves perf concerns (however it does it)?

Are we prepared to park this PR until #620 (which is ~18mo old) is resolved?

@lmolkova
Copy link
Contributor Author

Is it worth a MUST assuming #620 solves perf concerns (however it does it)?

Are we prepared to park this PR until #620 (which is ~18mo old) is resolved?

I'm happy to change it to SHOULD (please, it will be a MUST soon) until #620 is resolved. Do you agree that it's the only blocker for MUST?

@yurishkuro
Copy link
Member

Sorry, but I still haven't seen a convincing argument why this needs to be a requirement. I cannot overcome the fact that the only reason you want to require these fields is to do sampling, and not every conceivable form of sampling, but a very specific form that most people do not actually use today. And even if we start talking about attribute-based sampling, there can be plenty of other attributes could be way more valuable in sampling decisions than these http attributes, so why require these specific ones? This simply does not pass the bar for me for a narrow use case to be elevated to a requirement. I think you'd get as much mileage from SHOULD as from MUST.

calculating metrics without a new instrumentation code

Of the three RED metrics, error and duration cannot be calculated until span is finished, so it doesn't matter when span attributes are set.

@lmolkova
Copy link
Contributor Author

lmolkova commented Sep 24, 2021

I don't care much about this change, I want to have clear rules for implementations to follow.

And even if we start talking about attribute-based sampling, there can be plenty of other attributes could be way more valuable in sampling decisions than these http attributes, so why require these specific ones?

I'm trying to move HTTP spec forward, not sampling story. I want to have a good, consistent e2e story for users. Lack of clarity in this spec causes inconsistencies in instrumentations and broken experience for users - MUST is a way to provide clarity.

Of the three RED metrics, error and duration cannot be calculated until span is finished, so it doesn't matter when span attributes are set.

For metrics, we don't even have to start a real span if it's sampled out - we can have a lightweight one that only supports a few required attributes, provided at start time and measures duration. But you're right that all required attributes indeed can be provided after start for it to work.

@iNikem
Copy link
Contributor

iNikem commented Sep 24, 2021

a very specific form that most people do not actually use today.

Please be careful with such sweeping statements. In my pond the situation is the opposite: nobody cares about probabilistic sampling but a lot of people want attribute-based sampling (or Views, as you called them in another issue).

@anuraaga
Copy link
Contributor

anuraaga commented Sep 24, 2021

For better or worse, the sample method accepts Attributes. I agree with @iNikem that this is an important use case - but without a consistent base then that use case basically gets blocked. If some library, or some language even, does not populate an attribute while others do, setting up any sort of attribute-based sampling becomes a nightmare for users, if it's even possible.

It seems to be the same as the required field for conventions - presumably they are defined so that backends have a consistent base to work with, even though not all backends will actually use them all. But that base needs to be defined. I don't think I see a definition for what required actually means in RFC-speak - if required is a MUST, I'd expect this definition to also be MUST, or otherwise both could be SHOULD.

There is performance overhead in providing attributes on-start since in most cases it requires allocation of a dictionary, instead of doing this:

This comment specifically seems to point out the dangers - we can't provide a consistent experience to users if one language SIG happens to deem the performance implication too high and doesn't implement these at sampling time while other languages do.

@Oberon00
Copy link
Member

Oberon00 commented Oct 6, 2021

Do you think we can merge this? Then we could also finish up the build-tools stuff (merging open-telemetry/build-tools#70 without changing to SHOULD and cutting a new release that also contains the event name stuff from open-telemetry/build-tools#67)

@lmolkova
Copy link
Contributor Author

lmolkova commented Oct 6, 2021

Do you think we can merge this? Then we could also finish up the build-tools stuff (merging open-telemetry/build-tools#70 without changing to SHOULD and cutting a new release that also contains the event name stuff from open-telemetry/build-tools#67)

@Oberon00 yes, I believe so, thanks! and then I'll follow up on open-telemetry/build-tools#70

CHANGELOG.md Outdated Show resolved Hide resolved
@lmolkova
Copy link
Contributor Author

Discussed MUST vs SHOULD over Tuesday Instrumentation SIG meeting (10/5/21) , and came to the agreement to start with MUST, the summary:

  • Feasibility
    • [out of scope of this PR] we should probably separate server and client specs more and make attributes sets more consistent (e.g. url for client and components for server)
    • there is no reason to believe that it's not feasible, i.e.
      • clients have url before the request starts (Java's netty client has only components though)
      • and servers have components ready before they start span (have to start after reading context from headers)
  • SHOULD vs MUST:
    • SHOULD does not have enough power, especially for people that are far from tracing UX
    • Consistency is important (for sampling and common required attributes); instrumentations tend to be inconsistent, staying within current spec
    • Start with a MUST and check if 90%+ of instrumentation can comply. If not, switch to SHOULD
  • There is no punishment for not complying, except some broken experience
    • eventually, we may have a certification test for instrumentations, but no strong enforcement

@lmolkova lmolkova force-pushed the define-http-sampling-attributes branch from 6cace67 to f34abb0 Compare October 12, 2021 15:08
@bogdandrutu
Copy link
Member

@yurishkuro @open-telemetry/technical-committee we discussed this in the spec meeting, and decide to file an issue (to track if this is the right decision or not) that needs to be resolved before the stability of this document, and proceed with the current proposal.

@carlosalberto
Copy link
Contributor

Merging as further details will be decided as part of #2011, as discussed in the previous Spec SIG call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.