Fix consumer pause resume functionality #1382

arszen123 · 2022-06-01T12:59:16Z

This fixes #1376

The problem was that when in the eachMessage or eachBatch the consumption from a topic-partition is paused and an error is thrown (breaks further consumption of messages). The same batch (that was already fetched) is processed again, instead of fetching a new batch of messages.

Fixed consumer pausing when breaking the consumption flow

…lity

src/consumer/runner.js

Nevon · 2022-06-01T13:48:30Z

src/consumer/runner.js

@@ -404,39 +440,7 @@ module.exports = class Runner extends EventEmitter {
      await this.heartbeat()
    }

-    return this.retrier(async (bail, retryCount, retryTime) => {
-      try {
-        await onBatch(batch)


This now means that on every error, we are going to re-fetch the data for no real reason. The data that we received is still valid, it's just that we don't want to process some of it. I would instead suggest that we conditionally invoke onBatch only if the topic-partition is not paused. That way if the topic-partition was paused in the execution of the handler, we just continue processing the other data included in the fetch response without having to do any additional data fetching, and we don't have to do any particularly intrusive changes to the runner. It already has access to the ConsumerGroup which holds the state of which topic-partitions are paused. It does mean that we do filtering in two places, but I would consider that a much smaller price to pay.

Correct me if I am wrong, but based on the fetch manager implementation, the rest of the batches (from other partitions) will be processed in that cycle. Therefore there will be no unnecessary data fetching.
If the processing of a batch fails, it is most likely that part of that batch will be re-fetched in the next cycle, which I think is okay, because we don't know what the last committed offset was to filter out messages. It is possible to fetch the offsets but I think it's more convenient to drop and re-fetch.
Thus the messages from the paused partition will be fetched in the next fetch cycle, right after the consumption from that partition is resumed.

Nevon · 2022-06-27T15:25:25Z

This also solves the issue of unhandled promise rejections mentioned here

arszen123 added 3 commits May 31, 2022 16:35

Fixed consumer pausing

b1784b0

Fixed consumer pausing when breaking the consumption flow

Fixed consumer connection error

44f9369

Merge branch 'tulios:master' into fix/consumer-pause-resume-functiona…

cebbfce

…lity

Nevon requested changes Jun 1, 2022

View reviewed changes

Nevon mentioned this pull request Jun 2, 2022

Provide a pause() helper to eachMessage/eachBatch #1364

Merged

Nevon approved these changes Jun 27, 2022

View reviewed changes

Merge branch 'master' into fix/consumer-pause-resume-functionality

214427f

Nevon merged commit 51a4947 into tulios:master Jun 28, 2022

arszen123 mentioned this pull request Jul 5, 2022

Fix consumer connection issue #1408

Merged

ModernTrollfare mentioned this pull request Aug 23, 2022

Retriable Error "KafkaJSRequestTimeoutError" was thrown uncaught during a rebalance #1410

Closed

snyk-io bot mentioned this pull request May 21, 2024

[Snyk] Upgrade kafkajs from 2.1.0 to 2.2.4 Hawthorne001/v4-chain#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix consumer pause resume functionality #1382

Fix consumer pause resume functionality #1382

arszen123 commented Jun 1, 2022

Nevon Jun 1, 2022 •

edited

Loading

arszen123 Jun 2, 2022

Nevon commented Jun 27, 2022

Fix consumer pause resume functionality #1382

Fix consumer pause resume functionality #1382

Conversation

arszen123 commented Jun 1, 2022

Nevon Jun 1, 2022 • edited Loading

Choose a reason for hiding this comment

arszen123 Jun 2, 2022

Choose a reason for hiding this comment

Nevon commented Jun 27, 2022

Nevon Jun 1, 2022 •

edited

Loading