Make metrics aggregation opt-in for batches where sink supports multiple timestamps #18917

moschroe · 2023-10-24T10:53:08Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

Bulk import of existing data. There are log files of legacy systems/impromptu solutions in csv format.

Attempted to import into influxdb, which completely annihilated any granularity due to events from file occurring at high rate and being aggregated by wall clock time.

Attempted Solutions

batch.max_events: 1 did faithfully import all events, although at significant performance penalty.

Proposal

While keeping aggregate mode as default, document it, and provide the option to keep all events unaltered until batch criteria (like event count) are met.

Alternatively, batching by timestamp should regard the event time, so that, for example, 15s aggregations of 1/s metrics could be processed at 1000 events/s from a file, with the resulting metrics having 15s granularity. Ingesting an event backlog and aggregating by current wall clock time is meaningless.

But this would not solve any case where many events with high temporal resolution should be processed verbatim.

This should be done for any sink that allows multiple values per batch with individual timestamps.

References

Document that metric sinks aggregate metrics within a batch #7938

Version

vector 0.32.2 (x86_64-unknown-linux-gnu)

The text was updated successfully, but these errors were encountered:

StephenWakely · 2023-10-25T13:27:36Z

We've just changed the Prometheus Remote Write sink so you can configure it to not aggregate within a batch #18676. We should do the same when we update the Influx metrics sink #9261.

StephenWakely · 2023-11-09T15:58:09Z

We need to do an audit of sinks to ensure this, but it looks like currently InfluxDB is the only sink that does aggregation of metrics. This needs to be done as part of #19102.

moschroe added the type: feature A value-adding code addition that introduce new functionality. label Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make metrics aggregation opt-in for batches where sink supports multiple timestamps #18917

Make metrics aggregation opt-in for batches where sink supports multiple timestamps #18917

moschroe commented Oct 24, 2023

StephenWakely commented Oct 25, 2023

StephenWakely commented Nov 9, 2023

Make metrics aggregation opt-in for batches where sink supports multiple timestamps #18917

Make metrics aggregation opt-in for batches where sink supports multiple timestamps #18917

Comments

moschroe commented Oct 24, 2023

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

StephenWakely commented Oct 25, 2023

StephenWakely commented Nov 9, 2023