-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC client might not be properly balancing requests #4274
Comments
The issue is still append with the version v0.88.0 |
We also have this problem with version 0.92.0. |
I had some time to play with this today, and I confirm that the current version of the collector doesn't allow for a client to properly load-balance requests to backends. To reproduce this, here are the steps: k3d cluster create
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
kubectl wait --for=condition=Available deployments/cert-manager -n cert-manager
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
kubectl wait --for=condition=Available deployments/opentelemetry-operator-controller-manager -n opentelemetry-operator-system
kubectl create -f https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.73.2/bundle.yaml
kubectl wait --for=condition=Available deployments/prometheus-operator -n default
kubectl apply -f https://gist.githubusercontent.com/jpkrohling/ddb7d0074fa7858602302897ad495f35/raw/8527ccf2842bb3ebba18c89bf4573ee713ddced3/resources-current.yaml
kubectl wait --for=condition=Available deployments/client-collector -n observability
kubectl port-forward -n observability service/client-collector 4317:4317
kubectl port-forward -n observability service/prometheus-operated 9090:9090
telemetrygen traces --otlp-insecure --rate 1000 --duration 5m Then open http://localhost:9090/graph?g0.expr=sum%20by%20(instance)%20(rate(otelcol_receiver_accepted_spans%7B%7D%5B1m%5D))&g0.tab=0&g0.display_mode=stacked&g0.show_exemplars=0&g0.range_input=30m (this should open Prometheus with the query You'll see that only one of the servers is receiving spans. Then, deploy the patched Collector and run telemetrygen again:
This should show that the three backend collectors are receiving their even share of load. I ran the tests in the reverse order in the following image, but for reference, here's what I see: There's one bug on the OTLP Exporter validation routine that prevents this from working properly, but other than that, it should work when having a Collector config like this, as seen in the gist I used above: exporters:
otlp:
endpoint: dns:///server-collector-headless.observability:4317
balancer_name: round_robin
|
Fixes #4274 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
…ry#10010) Fixes open-telemetry#4274 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
As reported in the Jaeger issue tracker, the OpenTelemetry Collector might not be properly load balancing requests across Jaeger Collector replicas.
Reference: jaegertracing/jaeger#1678
The text was updated successfully, but these errors were encountered: