feat: drop use of FetchRequests #1369
Open
+329
−640
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Record streams are streamlined by dropping the
FetchRequest
middleman. Instead, queues of records are lazily created byKafkaConsumerActor
, one for each assigned partition.Enqueueing records uses
Queue.tryOffer
, to avoid blocking the polling fiber.KafkaConsumerActor
maintains a separatespillover
map to hold records when queues are full. This acts as back-pressure and pauses polling for "slow" partitions.KafkaConsumerActor.poll
takes a more active role in managing assigned partitions. Namely, it signals the revocation of any partitions that are not assigned, and drops those from the internal state.While the
ConsumerRebalanceListener
interface is meant to handle this state when the consumer faces rebalance operations, explicitly handling assignments inpoll
caters to manually assigned partitions. In addition, it ensuresKafkaConsumerActor
's internal state stays consistent in the face of race conditions between rebalance operations and the setup of record streams inKafkaConsumer
.The newly introduced
partitionState
(maintaining per-partition queues and spillover records) bears some resemblance to the formerfetches
andrecords
fields, but differs in some important ways:withConsumer.blocking
.records
acted as a holding area, but mainly supported a race condition between a new assignment, the start of a new stream, and the subsequent registration of fetch requests.records
was not generally used as a spillover area for incoming data.In the face of multiple streams hooked up to a single consumer, an inherent race in the old registration of fetch requests meant that each chunk of records could be added to all or only a subset of the listening streams. With the new approach, multiple streams will forcefully compete to take elements from the queue, ensuring each chunk of records goes to only one stream. (While I do not expect that such use of multiple streams was a feature, the potential behavior change is noted.)
An internal
StreamId
was previously used to matchFetchRequest
s to the owning stream. This is no longer used and the ID is dropped.KafkaConsumer
previously kept track of its partition assignment, along with aDeferred
instances to interrupt partition streams. This is now handled byKafkaConsumerActor
, which relies solely on the underlying consumer to keep track of its current assignment.Overall, this change should reduce some overhead and latency between the polling thread and the record streams. It does this by removing layers through which records must pass, as well as reducing synchronization between polling and consuming fibers.