Concurrent flush across batches #3192

codebien · 2023-07-13T09:55:57Z

What

Move the concurrency for flushing metrics from per-flush to per-batch.

The expected architecture is one goroutine doing the following operations:

Fetch the buckets from the buckets queue
Split time series into batches
- Encoding as protobuf
- Enqueue the batch as a job to be pushed to the remote service

And a series of concurrent goroutines doing the following operations:

Fetch a job
Invoke the metricsClient.push operation

Why

We have seen not optimal handling when we hit tests with lot of active time series (> 100k). The flush operation will split them in batches and then pushes them sequentially, doing some math like the following, it is to see why we could hit some >10s per single flush operation.

Example

100k time series
1k time series as batch limit

that generates 100 batches

in the case, we don't have perfect networking (e.g 100ms per request) then we will end with a total of 10 seconds for flushing a single iteration of 100k active series (100 batches * 100 ms), and it can even grow with worst cases.

The text was updated successfully, but these errors were encountered:

codebien mentioned this issue Jul 13, 2023

Cloud output v2 #3117

Closed

olegbespalov self-assigned this Jul 13, 2023

olegbespalov mentioned this issue Jul 18, 2023

Implement concurrent pushes across batches #3206

Merged

5 tasks

mstoykov closed this as completed in #3206 Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent flush across batches #3192

Concurrent flush across batches #3192

codebien commented Jul 13, 2023 •

edited

Loading

Concurrent flush across batches #3192

Concurrent flush across batches #3192

Comments

codebien commented Jul 13, 2023 • edited Loading

What

Why

Example

codebien commented Jul 13, 2023 •

edited

Loading