[chore][pkg/stanza/fileconsumer] Emit logs in batches #36276

andrzej-stencel · 2024-11-08T15:28:50Z

Description

Modifies the File consumer to emit logs in batches as opposed to sending each log individually through the Stanza pipeline and on to the Log Emitter.

This was achieved via the following incremental changes, with the goal of making code reviews easier (multiple smaller changesets):

a584e0a collect scanned tokens into batches in Reader::ReadToEnd method in File consumer, but still call emit function for each token individually
6391f55 change emit.Callback function signature to accept a slice of tokens and emit tokens in batches from the Reader. At this point, the batches are still split into individual tokens inside the emit function, because the Stanza operators can only process one entry at a time.
187b345 add ProcessBatch method to Stanza operators and use it in the emit function. At this point, the batch of tokens is translated to a batch of entries and passed to Log Emitter as a whole. The batch is still split in the Log Emitter, which calls consumeFunc for each entry in a loop.
8e3197b do not split batches in Log Emitter, call consumeFunc on the whole batch of entries

Note that this is currently a draft, requesting initial feedback. I haven't yet implemented the ProcessBatch method for all Stanza operators, as I'd like to first get feedback its definition. Specifically, should the function accept a []entry.Entry or []*entry.Entry?

Link to tracking issue

Fixes [pkg/stanza/fileconsumer] Emit logs in batches #35455

Testing

No changes in tests. The goal is for the functionality to not change and for performance to not decrease.

Documentation

These are internal changes, no user documentation needs changing.

…ch entry still Next step is to change the emit function to accept a batch.

But still send each entry one by one to the next consumer in the file input. Next step is to change Stanza operators to accept batches.

The File input's `emitBatch` function now calls `ProcessBatch` instead of `Process`. The added `ProcessBatch` method will make each Stanza operator capable of accepting a batch of entries.

This changes the Log Emitter to run the `consumerFunc` on the whole batch, instead of splitting the batch into individual entries and calling `consumerFunc` on each of them. This doesn't change much while the Log Emitter has its own `batch` buffer, but if we remove the `batch` buffer (see open-telemetry#35456), this should prevent the performance drop described in open-telemetry#35454.

github-actions · 2024-12-04T05:21:01Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

andrzej-stencel · 2024-12-04T09:04:12Z

Superseded by #36663.

github-actions bot added pkg/stanza receiver/otlpjsonfile labels Nov 8, 2024

github-actions bot requested review from atoulme and djaglowski November 8, 2024 15:29

andrzej-stencel changed the title ~~[chore][pkg/stanza/fileconsumer] emit logs in batches~~ [chore][pkg/stanza/fileconsumer] Emit logs in batches Nov 8, 2024

andrzej-stencel mentioned this pull request Nov 11, 2024

[pkg/stanza/fileconsumer] Emit logs in batches #35455

Open

andrzej-stencel added 6 commits November 19, 2024 14:00

refactor: introduce batching in Reader, but call emit function for ea…

cd84a14

…ch entry still Next step is to change the emit function to accept a batch.

change emit.Callback signature to accept a slice of tokens

84cf2c7

But still send each entry one by one to the next consumer in the file input. Next step is to change Stanza operators to accept batches.

update changelog entry

16ec3bc

add WriterOperator::WriteBatch and Operator::ProcessBatch methods

b2d8204

The File input's `emitBatch` function now calls `ProcessBatch` instead of `Process`. The added `ProcessBatch` method will make each Stanza operator capable of accepting a batch of entries.

rewrite changelog entry

0d51650

andrzej-stencel force-pushed the emit-multiple branch from 8e3197b to 0d51650 Compare November 19, 2024 13:24

github-actions bot added the Stale label Dec 4, 2024

andrzej-stencel closed this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore][pkg/stanza/fileconsumer] Emit logs in batches #36276

[chore][pkg/stanza/fileconsumer] Emit logs in batches #36276

andrzej-stencel commented Nov 8, 2024

github-actions bot commented Dec 4, 2024

andrzej-stencel commented Dec 4, 2024

[chore][pkg/stanza/fileconsumer] Emit logs in batches #36276

[chore][pkg/stanza/fileconsumer] Emit logs in batches #36276

Conversation

andrzej-stencel commented Nov 8, 2024

Description

Link to tracking issue

Testing

Documentation

github-actions bot commented Dec 4, 2024

andrzej-stencel commented Dec 4, 2024