-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalized runtime backpressure #2264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I've left some comments on implementation details.
src/libponyrt/actor/actor.c
Outdated
} | ||
} | ||
|
||
void maybe_mute(pony_ctx_t* ctx, pony_actor_t* to, pony_msg_t* first, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name should be ponyint_maybe_mute
according to the runtime naming conventions.
src/libponyrt/actor/actor.c
Outdated
|
||
pony_msg_t* m = first; | ||
|
||
while(m != last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating the message chain here is going to be expensive, so I think it would be nice to refactor in order to remove the need for the loop.
In the current state of things, a message chain cannot contain ORCA messages, so a possible alternative would be to take the message chain length as a parameter. If we want to future-proof the function now, the parameter could be the number of application messages in the chain instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use that assumption about message chains not containing ORCA messages, we should likely try to add a pony_assert
somewhere to ensure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I think the best place for that assertion would be in pony_chain
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Praetonus if we had the number of application messages in the chain, that would become much easier. if there is more than 0, we do our check. if there are none, then we don't need our check. i was going to bring that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this with @SeanTAllen, I'm going to implement that change (counting the number of application messages in the chain) since it also requires a change to one of the optimisation passes, and I'll submit the patch in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praetonus' patch was applied.
src/libponyrt/actor/actor.c
Outdated
@@ -212,8 +212,20 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, size_t batch) | |||
app++; | |||
try_gc(ctx, actor); | |||
|
|||
// if we become muted as a result of handling a message, bail out now. | |||
if(actor->muted > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an atomic_load_explicit
with memory_order_relaxed
. Atomic operations without an explicit memory order implicitly use memory_order_seq_cst
and that's a performance hit.
src/libponyrt/actor/actor.c
Outdated
@@ -225,6 +237,15 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, size_t batch) | |||
// We didn't hit our app message batch limit. We now believe our queue to be | |||
// empty, but we may have received further messages. | |||
pony_assert(app < batch); | |||
pony_assert(actor->muted == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic_load_explicit
here.
src/libponyrt/actor/actor.c
Outdated
if(ponyint_messageq_push(&to->q, first, last)) | ||
{ | ||
if(!has_flag(to, FLAG_UNSCHEDULED)) | ||
if(!has_flag(to, FLAG_UNSCHEDULED) && (to->muted == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic_load_explicit
here.
src/libponyrt/actor/actor.c
Outdated
// 2. the sender isn't overloaded | ||
// AND | ||
// 3. we are sending to another actor (as compared to sending to self) | ||
if((has_flag(to, FLAG_OVERLOADED) || (to->muted > 0)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic_load_explicit
here.
src/libponyrt/sched/scheduler.c
Outdated
for(uint32_t i = 0; i < scheduler_count; i++) | ||
{ | ||
if(&scheduler[i] != sched) | ||
send_msg(i, SCHED_UNMUTE_ACTOR, (intptr_t)actor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that all schedulers can broadcast, send_msg_single
is unsafe. Calls to send_msg_single
should be replaced by calls to send_msg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that mean that we should remove send_msg_single
entirely? i believe that is the implication but I want to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only find one instance in send_msg_all
in scheduler.c
updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its gone!
src/libponyrt/sched/scheduler.c
Outdated
if(r == NULL) | ||
{ | ||
ponyint_muteset_putindex(&mref->value, sender, index2); | ||
sender->muted += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently equivalent to atomic_fetch_add_explicit(&sender->muted, 1, memory_order_seq_cst)
, i.e. a very expensive operation.
As far as I can see only one scheduler can mute/unmute a given actor at a time, so this can be replaced with
uint8_t muted = atomic_load_explicit(&sender->muted, memory_order_relaxed);
atomic_store_explicit(&sender->muted, muted + 1, memory_order_relaxed);
If I'm wrong and multiple schedulers can modify the muted
field of an actor at the same time, this should be atomic_fetch_add_explicit(&sender->muted, 1, memory_order_relaxed)
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct. only a single scheduler can mute an actor at a time as it happens on message send.
src/libponyrt/sched/scheduler.c
Outdated
|
||
void ponyint_sched_unmute(pony_ctx_t* ctx, pony_actor_t* actor, bool inform) | ||
{ | ||
// this needs a better name. its not unmuting actor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest ponyint_sched_unmute_senders
.
src/libponyrt/sched/scheduler.c
Outdated
while((muted = ponyint_muteset_next(&mref->value, &i)) != NULL) | ||
{ | ||
pony_assert(muted->muted > 0); | ||
muted->muted -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as in ponyint_sched_mute
, these two lines and the if
below can be replaced with
uint8_t muted_count = atomic_load_explicit(&muted->muted, memory_order_relaxed);
pony_assert(muted_count > 0);
muted_count--;
atomic_store_explicit(&muted->muted, muted_count, memory_order_relaxed);
if(muted_count == 0)
...
@SeanTAllen Would you consider also adding some variation of https://gist.github.com/slfritchie/0dab74fd729b7ecdd2a11c32c1f984cb? |
@slfritchie i think that's reasonable. do you think coarse grained tracking on the number of times an actor is overloaded or muted would be interesting? (also overloading cleared and unmuted) |
src/libponyrt/sched/scheduler.c
Outdated
muted_count--; | ||
atomic_store_explicit(&muted->muted, muted_count, memory_order_relaxed); | ||
|
||
if (muted->muted == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to either be an atomic_load_explicit
, or use the muted_count
local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks i missed that
@SeanTAllen tl;dr: yes. DTrace (and presumably SystemTap) permit easy dynamic probes at entry & exit to any function. This code is structured almost enough dynamic function entry probes could tell you most of what you'd like to know. Overload & not overload have dedicated functions. But mute & unmute do not. Muting status changes are buried far inside of The code could be restructured to give dedicated small functions for the actual mute state changes; then dynamic function entry probes are easy. However, there's infrastructure value in defining static probes for important events in the system. These new events are the kinds of thing that affect scheduling, and visibility into scheduling is Good. |
Ive found a couple problems with this implementation.
|
@SeanTAllen The first problem can be solved by two small modifications to the workstealing and quiescence detection algorithm.
Could you detail the circumstances in which the second problem can occur? It seems to me that a given actor can only be in one mutemap at a time. |
@Praetonus excellent ideas. and now that I am a little less tired, I realize you are correct about an actor only being able to be in a single mutemap at a time. I need to add comments to that effect. |
At the next sync, I'd like to discuss what sort of documentation this might need. Inclusion in the tutorial? On the website? Just notes in the code? |
@SeanTAllen Here's the diff containing the changes needed to remove the message chain iteration in |
@Praetonus patch applied. looking good. |
@Praetonus everything we talked about is in place. Sylvan helped me track down a bug. I have to add the ability for actors to manual indicate they can't make progress so that is included in the backpressure system and perf testing. but this is getting close. right now "can't make progress" would be a TCPConnection that is unable to send (backpressure for example). |
a0ee64d
to
f1a3b74
Compare
@slfritchie I added the telemetry info. Can you have a look to make sure I did it correctly? |
Things I need to do:
Please give this another review. It's ready for more feedback. Question, how, if at all should we document this somewhat advanced feature beyond the package level docs in Backpressure? |
src/libponyrt/actor/actor.c
Outdated
if(ponyint_messageq_push_single(&to->q, first, last)) | ||
{ | ||
if(!has_flag(to, FLAG_UNSCHEDULED)) | ||
if(!has_flag(to, FLAG_UNSCHEDULED) && | ||
(atomic_load_explicit(&to->muted, memory_order_relaxed) == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i move this to a nicely named function?
I just updated ProcessMonitor to use backpressure mechanism to prevent unbounded pending queue growth. This is a breaking API change to the constructor as a ApplyReleaseBackpressureAuth token is now required. |
Ok with the updates to TCPConnection documentation and with the addition of backpressure to ProcessMonitor, it appears to me that any of the "runaway memory growth" actors in the standard library have some sort of backpressure coverage. |
15b59ea
to
57af3be
Compare
There's a problem with work stealing and block messages. At the time a scheduler enters into steal() it might have a muted actor. This will cause it to not send a block message. When that actor is unmuted, it might be stolen by another scheduler, leaving the existing scheduler blocked but looping in steal, without ever being able to exit and without ever having sent a block message. |
The change to |
We did performance testing with Wallaroo. Using our standard testing app under a normal load of 3 million messages a 2nd, we saw no change in latencies. Awesome! |
src/libponyrt/actor/actor.c
Outdated
ponyint_sched_unmute_senders(ctx, actor, true); | ||
} | ||
|
||
PONY_API void pony_apply_backpressure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should follow the convention for runtime functions and take a pony_ctx_t*
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentionally done this way to make calling from Pony straightforward. Sylvan C and I spent a while coming up w this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense.
src/libponyrt/actor/actor.c
Outdated
set_flag(pony_ctx()->current, FLAG_UNDER_PRESSURE); | ||
} | ||
|
||
PONY_API void pony_release_backpressure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
src/libponyrt/actor/actor.c
Outdated
ponyint_sched_unmute_senders(ctx, ctx->current, true); | ||
} | ||
|
||
bool ponyint_triggers_muting(pony_actor_t* actor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this only needs the actor it’s unclear to me why we should do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I missed that.
src/libponyrt/actor/actor.c
Outdated
ponyint_is_muted(actor); | ||
} | ||
|
||
bool ponyint_is_muted(pony_actor_t* actor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
d3f6bd7
to
511acd8
Compare
not complete, need to expose "has flags"
@jemc @Praetonus @mfelsche @sylvanc i added comments and did some clean up. please have a look. where should there be additional explanation, comments etc. |
Latest perf testing round looks good. On the cleanup and blog post etc. |
…ime. A microbenchmark for measuring message passing rates in the Pony runtime. This microbenchmark executes a sequence of intervals. During an interval, 1 second long by default, the SyncLeader actor sends an initial set of ping messages to a static set of Pinger actors. When a Pinger actor receives a ping() message, the Pinger will randomly choose another Pinger to forward the ping() message. This technique limits the total number of messages "in flight" in the runtime to avoid causing unnecessary memory consumption & overhead by the Pony runtime. This small program has several intended uses: * Demonstrate use of three types of actors in a Pony program: a timer, a SyncLeader, and many Pinger actors. * As a stress test for Pony runtime development, for example, finding deadlocks caused by experiments in the "Generalized runtime backpressure" work in pull request #2264 * As a stress test for measuring message send & receive overhead for experiments in the "Add DTrace probes for all message push and pop operations" work in pull request #2295
I'm planning on squashing and merging this today. Here's the planned commit comment. Anything else that should be included? If no, I'll get this merged down then start working on a blog post that would announce the feature. This commit has backpressure to Pony runtime scheduling. Prior to this commit, it was possible to create Pony programs that would be able to cause runaway memory growth due to a producer/consumer imbalance in message sending. There are a variety of actor topologies that could cause the problem. Because Pony actor queues are unbounded, runaway memory growth is possible. This commit contains a program that demonstrates this. This commit adjusts the Pony scheduler to apply backpressure. The basic idea is: 1- Pony message queues are unbounded With this commit, we apply backpressure according to the following rules: 1- If an actor processes batch size application messages then it is overloaded. It wasn't able to drain its message queue during a scheduler run. Particular details on this 1- Sending to an overloaded or muted actor will result in the sender being muted unless the sender is overloaded. With this commit, the basics of backpressure are in place. Still to come: backpressure isn't currently applied from the cycle detector so its queue can still grow in an unbounded fashion. More work/thought needs to go into addressing that problem. Its possible that due to implementation bugs that this commit results in deadlocks for some actor topologies. I found a number of implementation issues that had to be fixed after my first pass. The basic algo though should be fine. There are a number of additional work items that could be added on to the basic scheme. Some might turn out to be actual improvements, some might turn out to not make sense. 1- Allow for notification of senders when they send to a muted/overloaded actor. This would allow application level decisions on possible load shedding or other means to address the underlying imbalance. 2- Allow an actor to know that it has become overloaded so it can take application level 3- Allow actors to have different batch sizes that might result in better performance for some actor topologies This work was performance tested at Wallaroo Labs and was found under heavy loads to have no noticeable impact on performance. |
Excellent! 👍 One small typo I noticed in the first line: I think "has" should be "adds". |
This is a first draft of generalized runtime backpressure. A final version would require a changes to
TCPConnection
and anything else that can become "overloaded" and would need to exert backpressure based on external conditions (such as a slow receiver).