Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid starvation in subscriptionManager #2109

Merged
merged 1 commit into from
Feb 25, 2022
Merged

Commits on Feb 25, 2022

  1. fix: avoid starvation in subscriptionManager

    The first few fetches from Kafka may only fetch data from one or two
    partitions, starving the rest for a very long time (depending on message
    size / processing time)
    
    Once a member joins the consumer groups and receives its partitions,
    they are fed into the "subscription manager" from different go routines.
    The subscription manager then performs batching and executes a fetch for
    all the partitions. However, it seems like the batching logic in
    `subscriptionManager` is faulty, perhaps assuming that `case:` order
    prioritizes which `case` should be handled when all are signaled which
    is not the case, according to the go docs
    (https://golang.org/ref/spec#Select_statements):
    ```
    If one or more of the communications can proceed, a single one that can
    proceed is chosen via a uniform pseudo-random selection. Otherwise, if
    there is a default case, that case is chosen. If there is no default
    case, the "select" statement blocks until at least one of the
    communications can proceed.
    ```
    
    For example - if you receive 64 partitions, each will be handled in
    their own go routine which sends the partition information to the
    `bc.input` channel. After an iteration there is a race between `case
    event, ok := <-bc.input` which will batch the request and `case
    bc.newSubscriptions <- buffer` which will trigger an immediate fetch of
    the 1 or 2 partitions that made it into the batch.
    
    This issue only really affects slow consumers with short messages. If
    the condition happens with 1 partition being in the batch (even though
    63 extra partitions have been claimed but didn't make it into the batch)
    the fetch will ask for 1MB (by default) of messages from that single
    partition. If the messages are only a few bytes long and processing time
    is minutes, you will not perform another fetch for hours.
    
    Contributes-to: #1608 #1897
    
    Co-authored-by: Dominic Evans <dominic.evans@uk.ibm.com>
    pavius and dnwe committed Feb 25, 2022
    Configuration menu
    Copy the full SHA
    dadcd80 View commit details
    Browse the repository at this point in the history