Fix race condition in cycle detector block sent handling #3666
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes a bug introduced with the recent short-lived actor garbage
collection improvements.
Here's the race:
Between 2 and 3, the cycle detector sends an IS_BLOCKED message to
the actor. The actor is rescheduled on the same scheduler thread as the
cycle detector.
Because in 3, we sent a BLOCK message and the rc is 0 (and can't change),
the cycle detector is going to delete the actor when it processes the
BLOCK message.
However, the actor exists on another schedulers queue. Either of the following
could happen:
detector.
Either way, hilarity in the form of segfaults or other race condition oddities
will occur.
As the comment in actor.c says:
If we mark the queue as empty, then it is no longer safe to do any
operations on this actor that aren't concurrency safe unless, the actor
has an rc of 0 and the cycle detector isn't aware of the actor's
existence.
Prior to this commit, sending a block message to the cd after marking the
queue as empty was not concurrency safe due to the aforementioned race
condition.