-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix batched queue starvation #2264
Conversation
The second (edit now removed) commit trades away some fairness for some throughput (about 12% more). With that commit, the bug fix results in an 8% reduction in the |
I pushed a bogus timeout for the test just to see if it passes at all. Edit: reverted. |
This represents a roughly 21% reduction in performance from series/3.x on the |
Before:
After:
|
I willingly sacrificed some performance here for a quicker turnaround of the fix. We're working on tuning certain parameters of the runtime (necessary tuning which we never really found the time for, but now we'll do it properly) and we should regain some of the performance loss of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random note: batch size is 128, and we check the batched queue once every 128 iterations. However, we may progress as few as 57 fibers every 64 iterations, and we may have also taken from the external queue, meaning that we could have only evaluated 57 + 57 - 1 = 113 fibers by the time we go back to the batched queue, which may in turn result in 128 of our local fibers getting dumped to the tail of the tail of the batched queue. The overlap here is 128 - 113 = 15 fibers. These fibers experience unfairness (though not indefinite unfairness, since they'll be at the front of the batch the next time through, meaning that they are guaranteed to run).
This is a pathological edge case and will be improved by subsequent iterations. It's fine for now.
There might be an issue with this fix. Please see #2269 |
Fixes #2263.