Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent flush across batches #3192

Closed
Tracked by #3117
codebien opened this issue Jul 13, 2023 · 0 comments · Fixed by #3206
Closed
Tracked by #3117

Concurrent flush across batches #3192

codebien opened this issue Jul 13, 2023 · 0 comments · Fixed by #3206
Assignees

Comments

@codebien
Copy link
Contributor

codebien commented Jul 13, 2023

What

Move the concurrency for flushing metrics from per-flush to per-batch.

The expected architecture is one goroutine doing the following operations:

  • Fetch the buckets from the buckets queue
  • Split time series into batches
    • Encoding as protobuf
    • Enqueue the batch as a job to be pushed to the remote service

And a series of concurrent goroutines doing the following operations:

  • Fetch a job
  • Invoke the metricsClient.push operation

Why

We have seen not optimal handling when we hit tests with lot of active time series (> 100k). The flush operation will split them in batches and then pushes them sequentially, doing some math like the following, it is to see why we could hit some >10s per single flush operation.

Example

100k time series
1k time series as batch limit

that generates 100 batches

in the case, we don't have perfect networking (e.g 100ms per request) then we will end with a total of 10 seconds for flushing a single iteration of 100k active series (100 batches * 100 ms), and it can even grow with worst cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants