Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discusses trace propagation concerns with trace reporting #42

Merged
merged 1 commit into from
Sep 13, 2020

Conversation

codefromthecrypt
Copy link
Member

@codefromthecrypt codefromthecrypt commented Sep 12, 2020

This formalizes a growing problem where tracing systems are used in a mesh
environment, instrumented with B3, and proxying the trace data endpoint.

This was noticed before in Zipkin, but the problem was rare. Now that it is
a routine concern, it is worth informing others using B3 format.

Thanks @basvanbeek for the analysis and @anuraaga for input leading to
this.

This formalizes a growing problem where tracing systems are used in a mesh
environment, instrumented with B3, and proxying the trace data endpoint.

This was noticed before in Zipkin, but the problem was rare. Now that it is
a routine concern, it is worth informing others using B3 format.

Thanks @basvanbeek for the analysis and @anuraaga for input leading to
this.
We recommend having your trace reporting library send `b3: 0`. This cements a
cost to everyone, only to help those proxying their trace data. However, it is
becoming common in mesh environments to proxy everything, and in reality there is
limited ability for end users to change the sampling policy of proxies.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly.

spring-cloud-sleuth has similar except the policy can merge from different sources, ex a deny list for well known paths that shouldn't be traced, and user-supplied policy.

however you compose the list, it would look like amazon's format in abstract, especially subject to a request rate (per second) is better than a percentage (0.1). amazon I think mixes these words a little but in general the format is good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an interesting more comprehensive format for trace config is this, mentioning for research, not necessarily direct input to how proxy config works https://github.com/kamon-io/Kamon/blob/master/core/kamon-core/src/main/resources/reference.conf#L218-L226

hats off to @ivantopo and @dpsoft for the design

data sent to a tracing system, there are generally 2 ways to address this proxy
setup:

1. Configure the proxy to not trace your trace reporting endpont (Ex in Zipkin, `POST /api/v2/spans`)
Copy link
Member Author

@codefromthecrypt codefromthecrypt Sep 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside to anyone who can control the code of proxies:

It would be a much better solution for them to add the endpoint they have configured for their tracer to a "deny list" for HTTP sampling policy. In other words, the sampling policy dictated by users would collaborate with the system concern like this. Users don't need to do any config maintenance this way.

Such an approach would be safe as intentionally tracing the tracing system is edge case, and tracing the tracing system through a proxy, is even more so.

Understanding some changes to proxies aren't accepted or can take many months to merge, then even longer to roll out, this is more mentioned for long term vs an expectation of this changing anything today.

Copy link

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants