-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chunk-based consumer API #1281
Conversation
After adding the missing methods smiliar to the ones from the EDIT: What the library could also do is something like the following: def consumeTopic[K, V](consumer: KafkaConsumer[IO, K, V], processor: Chunk[ConsumerRecord[K, V]] => IO[Unit]): IO[Unit] =
consumer.partitionedStream
.map(
_.chunks
.evalMap(consumeChunk(processor))
)
.parJoinUnbounded
.compile
.drain (Generalized over |
Nice work! Let's start the discussion 🧵 This addition looks good, because it leverages the logic to the library, but it also means we're being more strict on the patterns we support. For example, our $WORK use case doesn't involve committing offsets, but we're working with So far, in my experience, working with The discussion this PR brings, philosophically, is somewhat like: do we think there's a canonical pattern for Kafka consumption around the And to be honest I don't know. I've mixed feelings, on one hand, by looking at the code, it's so simple that adding it to the library doesn't disrupt the rest of the API, but on the other hand, if it's that simple, why not having in in user-land? Otherwise looks to me that we maybe want this library to become more like a framework? Because we are loosing (not actually loosing, because the current API remains untouched, but you get the idea) the What I agree on is to properly document this use case, so the question doesn't pop up again in Discord. I'd love to see what the original designers think about it 👀 |
My 2c: However, what I would not do is add the whole set of variations mirroring the full api, I'd really just add the one method, otherwise we're back to the issue of people having to navigate which one to use. If you know enough that you need a variation, you can use the low-level Stream api directly. Asides:
|
Thank you for your comments, really appreciate it. I think it's a good idea to only add one, biased, method to give the users an obvious way of using the patterns that works well for most cases. I'll proceed in that direction and will also start to add some docs for it. These docs will also mention that this is the first place where the library automatically commits offsets, but not using auto-commit (@aartigao thank you for mentioning the auto-commit use cases, I didn't think about that initially). @SystemFw regarding your points to the implementation details:
|
It should be an |
Oh, I see. But that relies on the assumption that the previous stream never finishes, right? If the stream would terminate without error, the whole action wouldn't terminate because of the |
Should we use |
I'd say this is ready to be reviewed, the only questions I have is the one above regarding the |
modules/core/src/main/scala/fs2/kafka/consumer/KafkaConsumeChunk.scala
Outdated
Show resolved
Hide resolved
This also introduces a `CommitNow` object instead of `Unit`
modules/core/src/main/scala/fs2/kafka/consumer/KafkaConsumeChunk.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding docs, without these suggestions, docs / run
is failing
Co-authored-by: Alan Artigao Carreño <alanartigao@gmail.com>
Co-authored-by: Alan Artigao Carreño <alanartigao@gmail.com>
Co-authored-by: Alan Artigao Carreño <alanartigao@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm super grateful of this type of PR:
- Small # of files
- Single feature
- Well documented
Bravo @L7R7 !
Enough time has passed since the creation and I'll be merging it soon.
Congrats to everyone involved!
@aartigao thank you for the review and thank you for helping me get through the last details yesterday. |
This adds an API to make the pattern of chunk-based consumers a first-class concept. The idea keeps popping up in the typelevel discord, and we've been successfully using it at $WORK for a while now, so it makes sense to add it.
General idea
The pattern aims at helping users to implement consumers without auto-commit to write their code without having to do too much work in order to achieve correct offset committing (no offsets must be lost, offsets should be committed only after messages have been processed, etc).
It achieves that by switching from processing messages in a
Stream[F, CommittableConsumerRecord[F, K, V]]
to processing them chunks:Chunk[ConsumerRecord[K, V]] => F[Unit]
. After each chunk, the offsets of the chunk are committed.This has a couple of advantages (summarizing what Fabio said on the Discord) :
open questions
KafkaConsume
trait in the new trait? I only added one of them so far to show the concept, the rest could be implemented in the same fashion.Unit
, the processor for the chunks could use a return type ofF[CommitNow]
, whereCommitNow
is an intention revealing equivalent ofUnit
that also makes it more clear that afterwards no processing should be done on the records. Do we want that?todos
There's still a lot of work to do, but I wanted to get some early feedback on the general concept before doing the busy work.
I'm looking forward to receiving some feedbacks and thoughts on this idea!