Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector exporter span throughput is lower than OTLP #456

Closed
dashpole opened this issue Jul 18, 2022 · 6 comments
Closed

Collector exporter span throughput is lower than OTLP #456

dashpole opened this issue Jul 18, 2022 · 6 comments
Assignees
Labels
bug Something isn't working trace

Comments

@dashpole
Copy link
Contributor

This is an issue reported by a customer. May be related to #388.

Exporter configuration:

exporters:
  googlecloud:
    project: <project-id>
    timeout: 10s
  otelhttp:
    endpoint: <endpoint>

Using self-observability metrics, the otelhttp exporter shows ~600k spans/s, but the googlecloud exporter seems to stay at 2.6k spans/s. The number of received spans matches the otelhttp exporter at ~600k spans/s.

@dashpole dashpole added bug Something isn't working trace labels Jul 18, 2022
@dashpole
Copy link
Contributor Author

dashpole commented Aug 1, 2022

We did some benchmarking with the logging exporter, which is probably relevant here. The configuration that worked the best for them was using a batch processor, and setting the number of consumers and grpc pool size (new in the latest version of the collector):

  batch:
    send_batch_max_size: 6000
    send_batch_size: 6000
  googlecloud:
    grpc_pool_size: 20
    sending_queue:
      num_consumers: 40

Even with those optimizations, it isn't quite as high-throughput as the OTLP exporter. We suspect this is because of the latency of cloud monitoring endpoints. They were able to achieve ~180k logs/s throughput. I would expect a similar throughput for spans.

@dashpole dashpole self-assigned this Aug 1, 2022
@dashpole
Copy link
Contributor Author

dashpole commented Aug 2, 2022

With the standard settings:

  batch:
    send_batch_max_size: 200
    send_batch_size: 200
  googlecloud:

I was able to get ~13k spans/second in a collector.

With this config, I was able to get ~130k spans/second:

  batch:
    send_batch_max_size: 6000
    send_batch_size: 6000
  googlecloud:
    sending_queue:
      num_consumers: 40

With grpc_pool_size: 20, i'd expect to be able to do even better, but I think I need to wait for a collector release for that.

Hopefully that helps.

@dashpole dashpole closed this as completed Aug 2, 2022
@dashpole
Copy link
Contributor Author

dashpole commented Aug 2, 2022

I should also mention that the exporter is generally resource constrained when trying to send high volumes. At 130k spans/second, it was using about 10 CPUs, and 20 GB of memory (total usage, not RSS, which is likely lower than that).

@emmaCullen
Copy link

Hi @dashpole , thank you for the tips here, we tried it this morning and saw a huge improvement.

We are still seeing some dropped spans however, and was wondering if you had any suggestions?

Dropped spans:
image (11)

Config used:

  batch:
    send_batch_max_size: 6000
    send_batch_size: 6000
  googlecloud:
    sending_queue:
      num_consumers: 40

Sent span rate: ~3k per second
Resource allocation: 4 CPU and 16GB
Resource usage: CPU 5%, Mem 18%

Our spans are quite data heavy ( a lot of attributes), but that discrepancy still seems quite large, am wondering if we could be missing something obvious here?

@dashpole
Copy link
Contributor Author

dashpole commented Aug 4, 2022

Just to explain the error message, it indicates the request failed multiple times in a row, with the most recent error being a deadline exceeded error. The dropped_items just shows the number of spans that were in the request, which aren't sent. If it is just happening occasionally, the best option is probably to raise the timeout on the exporter: timeout: 30s, since waiting for the "large" requests to complete is probably better than retrying them.

@emmaCullen
Copy link

Hi @dashpole, thank you for the reply and the explanation - we tried your suggestions and can confirm our issues have now been resolved !

Thanks again for the tips and taking a look, appreciate it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working trace
Projects
None yet
Development

No branches or pull requests

2 participants