Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Sampling strategies not working as expected #3925

Closed
Sucharitha95 opened this issue Sep 26, 2022 · 31 comments
Closed

[Bug]: Sampling strategies not working as expected #3925

Sucharitha95 opened this issue Sep 26, 2022 · 31 comments
Labels

Comments

@Sucharitha95
Copy link

Sucharitha95 commented Sep 26, 2022

What happened?

Configured the Sampling strategies as per the Jaeger documentation: https://www.jaegertracing.io/docs/1.34/sampling/

Sampling strategy does not seem to affect even after configuring the strategy using .json file. I can also see in the collector logs that the input json file is read but the output is not obtained as expected.
Following is the configuration .json file used by us by default:
{
"service_strategies": [],
"default_strategy": {
"type": "probabilistic",
"param": 0.1
}
}

Even after providing the sampling configuration as above keeping the type as probabilistic and parameter as 0.1.
When 100 spans are generated all of those spans are seen on UI and no filtration of spans is done.
Tested with the parameter as 0, even then I could see all the spans are stored and seen on UI.

Steps to reproduce

  1. Select a Sampling strategy and Parameter[As above case Probablistic and 0.1]
  2. Configure the sampling_strategies.json file using the method selected.
  3. Check if the spans are sampled as per the strategy provided.

Expected behavior

Sampling has to be performed as per the input configuration file.

Relevant log output

Log that Samplingstrategies file is being read:
{"level":"info","ts":1663994011.7913685,"caller":"static/strategy_store.go:138","msg":"Loading sampling strategies","filename":"/etc/sampling/samplingstrategies.json"}

Jaeger backend version

v1.34.0

Storage backend

Issue is seen with both Cassandra and Elastic search as backends.

Operating system

Ubuntu 18.04.6

@yurishkuro
Copy link
Member

Expected behavior
Sampling has to be performed as per the input configuration file.

How are you verifying that?

@Sucharitha95
Copy link
Author

Sucharitha95 commented Sep 28, 2022

Hi @yurishkuro ,
We have checked it in the Query UI. When 1000 traces are generated all of them can be queried out from the Query and all the 1000 traces were seen on Query UI.
We are also verifying it in the Collector logs where the following is seen for all the 1000 traces:
{"level":"debug","ts":1664348103.0998297,"caller":"app/span_processor.go:162","msg":"Span written to the storage by the collector","trace-id":"0dc7d9051509e0c0","span-id":"0dc7d9051509e0c0"}
{"level":"debug","ts":1664348103.09983,"caller":"app/span_processor.go:162","msg":"Span written to the storage by the collector","trace-id":"3ce311d349f53fb2","span-id":"3ce311d349f53fb2"}
{"level":"debug","ts":1664348103.099846,"caller":"app/span_processor.go:162","msg":"Span written to the storage by the collector","trace-id":"70958f6754f35127","span-id":"70958f6754f35127"}

@yurishkuro
Copy link
Member

Jaeger backend does not perform sampling, sampling happens in the SDK. Which SDK are you using? Which sampler is the SDK configured to use?

@Sucharitha95
Copy link
Author

Sucharitha95 commented Sep 30, 2022

Hi @yurishkuro ,
Thanks for your reply.
We are using Go-SDK and we are configuring the sampling in SDK as per the below link: https://pkg.go.dev/go.opentelemetry.io/otel/sdk/trace
The sampler we are using from the SDK is AlwaysSample() in our client side configuration[ https://github.com/open-telemetry/opentelemetry-go/blob/main/example/otel-collector/main.go ]

In the collector we are configuring file sampling as decribed in the below link :https://www.jaegertracing.io/docs/1.36/sampling/#file-sampling
However, Sampling is not considered by the collector which is provided using strategies.json file.
Can you please let us know what else need to be configured, for the collector to consider the .json file sampling.
Also, As you mentioned that Sampling happens on sdk then what can be the exact use of sampling_strategies.json file which is provided as input in Collector configuration.Could you please let us know the complete flow of work for sampling from Client side to the Collector.

@Sucharitha95
Copy link
Author

Hi @yurishkuro,
Can you please help us with this issue that is mentioned in above comment.

@yurishkuro
Copy link
Member

AlwaysSample means it will sample all requests unconditionally. It's no surprise that the sampling configuration in Jaeger would have no effect on that. Backend sampling configuration only works with so-called "remote" samplers. All Jaeger SDKs supported those, but not all OTEL SDKs do, even though I believe it is called out as a requirement in the OTEL spec (the jaeger_remote sampler type). Looking at OTEL Go SDK, I don't think it supports Jaeger remote sampler out of the box, but there is a contrib module that does: open-telemetry/opentelemetry-go-contrib#936

@Sucharitha95
Copy link
Author

Sucharitha95 commented Oct 19, 2022

Hi @yurishkuro ,
We have configured our trace generator using GO SDK remote sampling configuration similar to way mentioned in https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/samplers/jaegerremote.

jaegerRemoteSampler := jaegerremote.New(
"your-service-name",
jaegerremote.WithSamplingServerURL("http://{sampling_service_host_name}:5778"),
jaegerremote.WithSamplingRefreshInterval(10time.Second),
jaegerremote.WithInitialSampler(trace.TraceIDRatioBased(0.5)),
)
tp := trace.NewTracerProvider(
trace.WithSampler(jaegerRemoteSampler),
...
)
otel.SetTracerProvider(tp)
ticker := time.Tick(time.Second)
for {
<-ticker
fmt.Printf("\n
Jaeger Remote Sampler %v\n\n", time.Now())
spewCfg := spew.ConfigState{
Indent: " ",
DisablePointerAddresses: true,
}
spewCfg.Dump(jaegerRemoteSampler)
}
}

After configuring the SDK using the above configuration,samplingstrategies.json file configuration is successfully fetched.
But when we tried generating the traces, the sampling_strategies.json file of collector is still not applied and did not show any effect on the sampling of traces.
What is the reason for not applying the fetched sampling configuration, are we missing anything ? could you please help us?

@yurishkuro
Copy link
Member

What is printed when you do spewCfg.Dump(jaegerRemoteSampler)?

@Sucharitha95
Copy link
Author

Hi @yurishkuro,
logs.txt
Please find the output log of spewCfg.Dump(jaegerRemoteSampler) in above file.
Apart from this I have one another query regarding sampling,
Using 5778 we are performing the sampling, is there any way of configuring the sampling without using 5778 port and directly configuring the sampling configuration using Jaeger collector.

@yurishkuro
Copy link
Member

What tags do you get in the root span? The original Jaeger samplers would record the sampling method in the tags, not sure if OTEL SDK does that.

The output of the sampler state shows default probability of 0.7

@Sucharitha95
Copy link
Author

Hi @yurishkuro ,
We have tried to check for the tags in the Query UI but could not find anything of such in the UI.
Also,Can you please let us know is there any port of sampling for Collector other than applying sampling using Agent 5778 port as we are trying to remove agent and directly connect to Collector we want to know is there any way to perform Sampling using Jaeger collector.

@yurishkuro
Copy link
Member

We have an open ticket for it, but nobody stepped up to implement it. #1420

@yurishkuro
Copy link
Member

My bad, it's been already implemented: #1971

@Sucharitha95
Copy link
Author

Hi @yurishkuro ,
Thanks for the support, we have gone through the implementation in #1990, as per this sampling endpoint in collector was previously implemented using the Configmanager in pkg/clientcfg/clientcfghttp/cfgmgr.go and same is added in the cmd/collector/main.go, but currently the implementation is changed in cmd/collector/main.go and is not implemented using pkg/clientcfg/clientcfghttp. so is the latest code( cmd/collector/main.go) still can be used for adding sampling endpoint in collector?, May I know how it can be implemented.
In #1971 we have seen a discussion regarding JAEGER_SAMPLING_ENDPOINT environment variable but it was no where mentioned in the Jaeger documentation about the environment variable and it's usage.Can you please let us know how we can include and use the JAEGER_SAMPLING_ENDPOINT variable.

@yurishkuro
Copy link
Member

Sorry, I don't follow what you mean by "implementation changed", the collector is using the same shared http handler for sampling

clientcfgHandler "github.com/jaegertracing/jaeger/pkg/clientcfg/clientcfghttp"

The discussion on JAEGER_SAMPLING_ENDPOINT is irrelevant because it applied to Jaeger SDKs (now deprecated), but you're using OTEL SDK.

@Sucharitha95
Copy link
Author

Sucharitha95 commented Nov 10, 2022

Hi @yurishkuro ,
Thanks, from the implementation of #1990, understand that collector supports remote sampling.
Do we have any sample code like below(its for agent) in OTLP SDK for apply sampling and connect to collector with any collector sampling port.
jaegerRemoteSampler := jaegerremote.New(
"your-service-name",
jaegerremote.WithSamplingServerURL("http://{sampling_service_host_name}:5778"),
jaegerremote.WithSamplingRefreshInterval(10*time.Second),
jaegerremote.WithInitialSampler(trace.TraceIDRatioBased(0.5)),
)
tp := trace.NewTracerProvider(
trace.WithSampler(jaegerRemoteSampler),
...
)
otel.SetTracerProvider(tp)

Also, Does the sampling works for all the interfaces(ZIPKIN and HTTP, GRPC, OTLP) of collector?

@yurishkuro
Copy link
Member

Sorry, I don't follow your question. The code you showed is already for OTEL SDK

@Sucharitha95
Copy link
Author

Hi @yurishkuro ,
Can we use the same code for sampling at collector. If yes, what are all the ports supports for sampling? as we have different interfaces with different ports(4317,4318, 14268, 14250, 9411).

@yurishkuro
Copy link
Member

Sampling via SDK and collector are not interchangeable

@Sucharitha95
Copy link
Author

Sucharitha95 commented Nov 15, 2022

Hi @yurishkuro,
Sorry we did not understand, can you please explain little bit more on that, do you mean we cannot use any other port of collector but we need to use agent sampling port 5778 only?
what are all the configuration we need to do at OTLP SDK side and Jaeger side, in order to make remote sampling work.
The Configuration/changes we have done at OTLP SDK side is as below:
jaegerRemoteSampler := jaegerremote.New(
"your-service-name",
jaegerremote.WithSamplingServerURL("http://{sampling_service_host_name}:5778"),
jaegerremote.WithSamplingRefreshInterval(10*time.Second),
jaegerremote.WithInitialSampler(trace.TraceIDRatioBased(0.5)),
)
tp := trace.NewTracerProvider(
trace.WithSampler(jaegerRemoteSampler),
...
)
otel.SetTracerProvider(tp)

From Jaeger collector side,we have sampling_strategies.json file with similar configuration available in https://www.jaegertracing.io/docs/1.36/sampling/#file-sampling

After making above changes, We are generating spans.
With the above changes we are able to fetch the Jaeger collector sampling configuration but it is not applied if we checked it with the spans in the UI.Do we need to do any more changes for the remote sampling to work other than this?

@yurishkuro
Copy link
Member

With the above changes we are able to fetch the Jaeger collector sampling configuration

How did you confirm that?

but it is not applied if we checked it with the spans in the UI

How did you verify that?

@Sucharitha95
Copy link
Author

Sucharitha95 commented Nov 17, 2022

Hi @yurishkuro ,
How did you confirm that?
We have attached the file which is the output of code from below link:
https://github.com/open-telemetry/opentelemetry-go-contrib/blob/samplers/jaegerremote/v0.5.2/samplers/jaegerremote/example/main.go
The output contains the sampling configuration information which was given using jaeger_collector sampling_strategies.json file.
File of log which has sampling information in output: logs1.txt
Full log: logs.txt

How did you verify that?
Once the spans are generated, we are checking the spans count in the Jaeger UI but the spans count is not obtained as per the sampling rate that is mentioned in the configuration file(sampling_strategies.json file).

@yurishkuro
Copy link
Member

Once the spans are generated, we are checking the spans count in the Jaeger UI but the spans count is not obtained as per the sampling rate

The samplers are supposed to record the description & parameters of the sampler in the attributes of the root span. What do they say?

@Sucharitha95
Copy link
Author

Hi @yurishkuro,
Could you please tell us where we can check the root span and attributes of root span.

@yurishkuro
Copy link
Member

you can view the span in Jaeger
image

@Sucharitha95
Copy link
Author

Hi @yurishkuro ,
Please find the requested screenshot. We did not find the sampler.param and sampler.type in the tags section of span in Jaeger UI.
MicrosoftTeams-image(2)
Could you please let us know how we can set these parameters in OTLP SDK.Please share sample code if there is any.
Is this the reason for the sampling not working as expected?

@yurishkuro
Copy link
Member

Well, it is possible that OTEL SDK that you're using does not record the type of sampler in the attributes, although they should, precisely for this reason, it allows debugging issues like this. The old Jaeger SDKs always recorded the sampler type/param on the root span so that you would know the kind of sampling that was used. I recommend raising an issue in the corresponding OTEL SDK repository, there is not much we can investigate from Jaeger side.

@Sucharitha95
Copy link
Author

Hi @yurishkuro ,
Thanks for the information. We have raised a ticket for OTEL SDK.
Also, We have tested using open tracing. In our trace generator if we specify sampler.type as "const" we can see the spans in Jaeger UI and also sampler.param and sampler.type can be seen in the UI, but if we keep the sampler.type as "Remote" in the trace generator, we are not able to see any spans in the UI(even after setting samplingrate as 0.1,0.6 or 0.7).
When we tried to fetch the sampling configuration using curl command, we are able to fetch the configuration that was given in sampling strategies.json file in collector but spans are not visible in the Jaeger UI.
Below is command and fetch information we obtained:
curl http://<jaeger_agent_service_name>..svc.cluster.local:5778/sampling?service
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":0.7},"operationSampling":{"defaultSamplingProbability":0.7,"defaultLowerBoundTracesPerSecond":0,"perOperationStrategies":[{"operation":"/health","probabilisticSampling":{"samplingRate":0.6}},{"operation":"/metrics","probabilisticSampling":{"samplingRate":0.4}}],"defaultUpperBoundTracesPerSecond":0}}

@Sucharitha95
Copy link
Author

Hi @yurishkuro,
Can you please check my previous comment and help us by answering the question and also by letting us know what else needs to be done.

@Narenderbhcu
Copy link

Narenderbhcu commented Dec 7, 2022

Hi @yurishkuro ,

Using opentracing if sampling.Type is selected as "remote", can we see same in Jaeger UI tags section, like "const" we are seeing in above commented screenshot?

@yurishkuro
Copy link
Member

I did not see any concrete issue with Jaeger in this ticket, more like OTEL implementation / configuration, so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants