Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the producer, part 2 #550

Merged
merged 1 commit into from
Oct 15, 2015
Merged

Refactor the producer, part 2 #550

merged 1 commit into from
Oct 15, 2015

Conversation

eapache
Copy link
Contributor

@eapache eapache commented Oct 5, 2015

Second stand-alone chunk extracted from #544, (first chunk: #549). This uses the
produceSet struct in the aggregator as well, and moves the wouldOverflow and
readyToFlush methods to methods on the produceSet.

Knock-on effects:

  • now that we do per-partition size tracking in the aggregator we can do much
    more precise overflow checking (see the compressed-message-batch-size-limit
    case in wouldOverflow which has changed) which will be more efficient in
    high-volume scenarios
  • since the produceSet encodes immediately, messages which fail to encode are
    now rejected from the aggregator and don't count towards batch size
  • we still have to iterate the messages in the flusher in order to reject those
    which need retrying due to the state machine; for simplicity I add them to a
    second produceSet still, which means all messages get encoded twice; this is
    a definite major performance regression which will go away again in part 3
    of this refactor

@wvanbergen

@eapache eapache force-pushed the producer-refactor-2 branch from 381225c to 7c4ceb3 Compare October 9, 2015 14:14
@eapache eapache changed the title WIP Refactor the producer, part 2 Oct 9, 2015
@eapache eapache mentioned this pull request Oct 14, 2015
Second stand-alone chunk extracted from #544, (first chunk: #549). This uses the
`produceSet` struct in the aggregator as well, and moves the `wouldOverflow` and
`readyToFlush` methods to methods on the `produceSet`.

Knock-on effects:
 - now that we do per-partition size tracking in the aggregator we can do much
   more precise overflow checking (see the compressed-message-batch-size-limit
   case in `wouldOverflow` which has changed) which will be more efficient in
   high-volume scenarios
 - since the produceSet encodes immediately, messages which fail to encode are
   now rejected from the aggregator and don't count towards batch size
 - we still have to iterate the messages in the flusher in order to reject those
   which need retrying due to the state machine; for simplicity I add them to a
   second produceSet still, which means all messages get encoded twice; this is
   a definite major performance regression which will go away again in part 3
   of this refactor
@eapache eapache force-pushed the producer-refactor-2 branch from 7c4ceb3 to 733c74f Compare October 14, 2015 21:34
@@ -320,7 +320,7 @@ func TestAsyncProducerEncoderFailures(t *testing.T) {
leader.Returns(prodSuccess)

config := NewConfig()
config.Producer.Flush.Messages = 3
config.Producer.Flush.Messages = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because the 2 failed to encode messages are not counted towards the batch anymore I assume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bingo

@wvanbergen
Copy link
Contributor

👍

eapache added a commit that referenced this pull request Oct 15, 2015
@eapache eapache merged commit 4a6acf4 into master Oct 15, 2015
@eapache eapache deleted the producer-refactor-2 branch October 15, 2015 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants