Skip to content
This repository has been archived by the owner on Nov 10, 2020. It is now read-only.

Dangerous race-conditions in the buffer #32

Open
ankoh opened this issue Mar 5, 2016 · 2 comments
Open

Dangerous race-conditions in the buffer #32

ankoh opened this issue Mar 5, 2016 · 2 comments

Comments

@ankoh
Copy link

ankoh commented Mar 5, 2016

I haven't noticed it first as I used an idiomatic channel approach for a while but when I was stuck at 100k msg/sec, I took a look at your ring buffer once again.
In general I really like your buffer (and its amazingly fast 👍 ) but i've found some serious race conditions with your cursor locks.

They were hard to find because they're occurring really really rarely on my machine (1 out of 10k messages or so in a strictly serial 1-write-1-read test) but then your broker explodes.

Just take a look at ReadWait and WriteCommit.

func (this *buffer) WriteCommit(n int) (int, error) {
    start, cnt, err := this.waitForWriteSpace(n)
    if err != nil {
        return 0, err
    }

    // If we are here then there's enough bytes to commit
    this.pseq.set(start + int64(cnt))

    this.ccond.L.Lock()
    this.ccond.Broadcast()
    this.ccond.L.Unlock()

    return cnt, nil
}

func (this *buffer) ReadWait(n int) ([]byte, error) {
    if int64(n) > this.size {
        return nil, bufio.ErrBufferFull
    }

    if n < 0 {
        return nil, bufio.ErrNegativeCount
    }

    cpos := this.cseq.get()
    ppos := this.pseq.get()

    // This is the magic read-to position. The producer position must be equal or
    // greater than the next position we read to.
    next := cpos + int64(n)

    // If there's no data, then let's wait until there is some data
    this.ccond.L.Lock()
    for ; next > ppos; ppos = this.pseq.get() {
        if this.isDone() {
            return nil, io.EOF
        }

        this.ccond.Wait()
    }
    this.ccond.L.Unlock()

    // If we are here that means we have at least n bytes of data available.
    cindex := cpos & this.mask

    // If cindex (index relative to buffer) + n is more than buffer size, that means
    // the data wrapped
    if cindex+int64(n) > this.size {
        // reset the tmp buffer
        this.tmp = this.tmp[0:0]

        l := len(this.buf[cindex:])
        this.tmp = append(this.tmp, this.buf[cindex:]...)
        this.tmp = append(this.tmp, this.buf[0:n-l]...)
        return this.tmp[:n], nil
    }

    return this.buf[cindex : cindex+int64(n)], nil
}

The race scenario:

  1. ReadWait reads the producer position atomically with ppos := this.pseq.get()
  2. Context switch
  3. WriteCommit advances ppos, locks ccond, notifies sleeping consumers and finishes
  4. Context switch
  5. ReadWait calculates next, locks consumers, goes to sleep and never wakes up

Simple fix:
As you'll acquire the consumer lock anyway, you should read and write the producer position with consumer lock.

I haven't checked the other methods yet, but chances are good that there are more races like this one as you always use atomic operations without locking the parties.

@ankoh
Copy link
Author

ankoh commented Mar 5, 2016

Yep there are more situations like this. Everywhere where you advance the producer/consumer position and lock the conditions for the broadcast or rely on the broadcast to wake up.

@RaymondReddington
Copy link

i have this problem. and thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants