-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run cycle detector every N ms based on timer #2709
Run cycle detector every N ms based on timer #2709
Conversation
Are there measurements on the size of the "large amount of overhead" and the performance imporvements this change provides? Does the timer run continuously, if so that would seem to be a problem for long running programs, especailly in a battery operated environment? |
So...
|
@winksaville The overhead of the cycle detector can be quantified using the Before this change the performance is as follows: Without
With
There's about a 5 million message increase (on my mac) in the After this change the performance is as follows: Without
With
The performance for In regards to the timer, yes, it runs continuously for the duration of the program (unless I'm not sure I follow how the timer running continuously is inherently a problem for any long running program regardless of operating environment. Can you expand on what you meant by that comment? In regards to battery constrained environments, yes, the timer (and subsequent cycle detection processing) would add some additional overhead. However, it would be possible to manage the timer along with the dynamic scheduler scaling logic to ensure that when all scheduler threads go to sleep (not currently possible/enabled due to some unsquashed bugs) that the timer is cancelled because no work can be done by any actors and then recreated when the first scheduler thread is woken up. It might also be possible to do the same with 1 scheduler thread active (the current best case with dynamic scheduler scaling after all other threads are asleep) when that scheduler thread is quiescent during the steal loop (and pauses the cpu) but that would be a bit more complex to implement. NOTE: both of these proposed solutions would be additional work building on the current PR. |
@SeanTAllen Kind of. The current PONY_API void ponyint_destroy(pony_actor_t* actor)
{
// This destroys an actor immediately. If any other actor has a reference to
// this actor, the program will likely crash. The finaliser is not called.
ponyint_actor_setpendingdestroy(actor);
ponyint_actor_destroy(actor);
} This PR doesn't actually change the logic in PONY_API void ponyint_destroy(pony_actor_t* actor)
{
// This destroys an actor immediately.
// This will DEFINITELY CAUSE THE PROGRAM TO CRASH because the
// cycle detector has a reference to the actor.
// The finaliser is not called.
// This wouldn't be a problem if we could call the following
// but we can't because we don't have a ctx
// ponyint_cycle_actor_destroyed(ctx, actor);
ponyint_actor_setpendingdestroy(actor);
ponyint_actor_destroy(actor);
} Calling |
@dipinhora - What is the barrier to adding the |
@dipinhora, my concern is running a timer continuously means a server in the cloud will be charged CPU usage fees even if no activity is occuring and in battery operated devices it would needlessly drain the battery. To me, "polling" architectures just feel wrong, one of the things I love about message passing is that it's "reactive" and if there is no work to do the system should be able to go into a "deep sleep" mode. |
@jemc In theory one could call I'm not sure if This still wouldn't make it completely safe though because it could be possible that another actor that is not the cycle detector has a reference to the actor being destroyed by |
@winksaville I don't disagree with your concern regarding the CPU usage charge or the battery drain. However, looking at the current state of the pony runtime, there is always CPU usage (and thereby battery drain) because there is always at least 1 scheduler thread active and the largest CPU pause in As a result, I don't think the addition of the timer for cycle detection is a significant concern in the overall scheme of things (given the current pony runtime state). To optimize pony for battery operated devices and metered cloud environments would require some additional work on the runtime. The dynamic scheduler scaling was a step in the right direction but it is currently limited in not being able to suspend the last scheduler thread (so only the ASIO thread would remain awake). The next step would be to fix the bugs related to suspending the last scheduler thread to remove the last source of continuous CPU usage for programs that don't require it (i.e. any program where the input comes from an ASIO event so it cannot reach quiescence). Assuming the runtime can suspend all scheduler threads, this PR would then create an unnecessary source of continuous work (that would be a waste of cycles because by definition no actors could become blocked if all scheduler threads were suspended). This could be managed by having the dynamic scheduler scaling logic cancel the cycled detector timer before suspending and then recreate it when the first scheduler thread is woken up as described in my previous comment. This would not result in any continuous or unnecessary work due to the cycle detector timer and would be battery friendly and CPU resource minimal. In regards to polling architectures vs reactive architectures, the current implementation for the cycle detector is completely reactive (if no block messages are sent to it, it does no work; this is how I'm open to suggestions to improve this PR and/or alternatives to solve the issue of cycle detection overhead and do not want to force this change in as is if others don't agree it's a worthwhile change. |
@dipinhora - So it seems like we can freely change the function signature and/or the places where it is called without worrying about the question of whether some user will choose to call it with inappropriate arguments or in an inappropriate context. Does that make sense, or am I still missing something in what you're saying? |
3e9c3bb
to
c3f076b
Compare
@jemc Good point. There are only two instances where Neither of these two instances should result in a message to the cycle detector about the actor being destroyed (the cycle detector doesn't need to know it's being destroyed and the temporary main actor is created/destroyed after the runtime and cycle detector have been terminated). However, just in case of any future uses of |
Anyone have any opinion on the use of the timer vs having a scheduler thread to ensure the cycle detector is run consistently? I'm starting to wonder if that additional complexity of the timer is worth it or not. My initial thought was that the timer would be a good solution but it adds some complexity in relation to @winksaville's concerns regarding cpu/battery usage. Switching to having a scheduler thread ensure the cycle detector runs would accomplish the same goal without the additional complexity of canceling and recreating the timer when suspending/resuming the last scheduler thread. NOTE: This would result in that scheduler thread ( |
My take is that even though using a timer probably won't make pony any worse in than it is now, it does add another place that would have to be changed in the future to make pony more efficient in its CPU utilization. So trying to use the scheduler thread to ensure the cycle dector runs and see what its cost might be seems worth trying. |
@winksaville - regarding the point that this adds another place that would have to change to support low-power / long sleep systems, it's probably worth noting that @sylvanc wants to eventually remove the cycle detector in favor of another approach, and it's possible (likely?) that this effort would happen first anyway. |
@dipinhora this change seems fairly large and in a critical area, is there test coverage data to minimize regressions? |
ab2fe6e
to
0bdeb36
Compare
@winksaville I've added a test ( I'd also added a commit to remove the cycle detector timer and to instead have scheduler thread 0 send messages to the cycle detector to ensure it is woken up on a regular interval. The performance of this version is comparable to the performance with the cycle detector timer. I don't believe the test failure issue should hold up this PR (unless we determine there's a test logic error) as that seems to me to be unrelated to the changes to how cycle detection is triggered (the same test issue occurs if the test is added to |
As Sean mentioned on the sync call this week, "@sylvanc knows the cycle detector better than anyone", so I'd prefer if he could take a look at this before we merge it. |
Regarding the CI, well, I don't think it's practical to have failing CI tests on master, so if we know these tests aren't working consistently, but intend to follow up later to fix whatever is making them fail, we should mark the tests as disabled rather than leaving the CI in a non-green state. |
@jemc of course. I didn't mean to suggest that this PR get merged without @sylvanc's review. It was more me explaining why I feel the new test shouldn't necessarily hold up the change. Also agreed regarding the failing CI test. I left it enabled so others could see the CI failures. I'll mark it as disabled and leave a note here about how it can be run manually instead. |
0bdeb36
to
b9333aa
Compare
I've marked The test can manually be run by using one of the following commands:
or
|
81dbb35
to
c76ea2b
Compare
Took a closer look at the test failures. The test was failing because I missed some edge cases related to the non-timer version of triggering the cycle detector. The logic has been fixed and the test has been re-enabled (and is passing!). |
c76ea2b
to
7f0a9b1
Compare
Rebased to resolve merge conflict. |
@dipinhora can you rebase this against master? |
Prior to this change, the cycle detector would receive block/unblock messages from actors whenever the actors logically blocked or unblocked. This resulted in a large amount of overhead for cycle detection due the large amount of block/unblock messages that were repeatedly sent from an actor. This commit changes things so that actors do not send block messages to the cycle detector until the cycle detector asks an actor if it is blocked or not. The logic works as follows: * When an actor is created, the cycle detector is notified so it is aware of the new actor. * When an actor logically blocks, it makes note of that as a flag (same as before) but it no longer sends a block message to the cycle detector immediately. * When the ASIO thread starts, it asks the cycle detector to create a timer for running the cycle detection check. * When the timer fires, the cycle detector asks every actor it knows of if it is blocked or not and then checks for any cycles and does the normal conf/ack processing. * If an actor is logically blocked and receives a message from the cycle detector asking if it is blocked, the actor sends a block message to the cycle detector and makes note of that fact that it sent a block message in a flag. * If an actor gets a normal/application message, it immediately sends an unblock message to the cycle detector. Same if the actor gets a GC acquire or release message. * If an actor gets a conf message, it sends an ack if it has already sent a block (and didn't send an unblock already). * When the cycle detector receives any messages from actors (created, block, unblock, ack, etc), it processes them immediately to update it's viewmap and adds actors for deferred cycle detection checking when the next time the timer is triggered. This allows for any actors that might have received new work inbetween blocking and cycle detection to send an unblock message (which would remove the actor from the viewmap deferred status) and so would not be considered as part of cycle detection (making cycle detection more efficient). The end result is that there is almost no overhead to cycle detection with this new approach. Additional changes: * The `--ponycdmin`, `--ponycdmax`, and `--ponycdconf` parameters have been removed because they don't make sense any longer with this new approach. They have been replaced with `--ponycdinterval` to control how often the cycle detector runs it's detection (the interval determines how often the timer for detection fires). * A codegen test has been added to test the cycle detector There are currently two down sides to this new approach: * First, this new approach results in every new actor being registered with the cycle detector. Previously, actors were only registered with the cycle detector if they blocked. My naive understanding is that this is not a big deal because almost every actor would block at some point in a pony program as that is the only way to reach quiescence but I could be wrong about that.
7f0a9b1
to
d0ee40a
Compare
Rebased to pick up latest changes on master. |
This is great stuff @dipinhora. My only concern is that it can create a message storm when the cycle detector asks all actors if they are blocked. I think this can (in the future) be improved by asking only a sample of actors, in an incremental way. However, let's merge it now, and optimise it more down the road. Great PR! |
@sylvanc Good point regarding the message storm. I'll modify it to be incremental. @SeanTAllen I got emailed about your comments but I can't seem to find them in the UI. The following was your question:
The value of |
@dipinhora i deleted the comments because i figured it out. |
@SeanTAllen I'm confused. I'm not sure what you figured out since your comment was correct and that value doesn't change during the runtime of a program. Are you saying that you figured out that the code should stay as it is? |
In PR ponylang#2709, @sylvanc pointed out a concern regarding the cycle detector creating a message storm due to how it would ask all actors whether they are blocked or not on every interval. This commit changes the logic so that the cycle detector only asks set number of actors whether they are blocked or not. If it is not able to ask all actors, it will resume iterating actors on the next iteration. This commit also addresses @SeanTAllen's concern regarding the conditional check order using `ponyint_actor_getnoblock`.
@dipinhora well then, shame on me! |
@sylvanc @SeanTAllen PR #2800 should address both of your concerns. |
In PR #2709, @sylvanc pointed out a concern regarding the cycle detector creating a message storm due to how it would ask all actors whether they are blocked or not on every interval. This commit changes the logic so that the cycle detector only asks set number of actors whether they are blocked or not. If it is not able to ask all actors, it will resume iterating actors on the next iteration. This commit also addresses @SeanTAllen's concern regarding the conditional check order using `ponyint_actor_getnoblock`.
Prior to this change, the cycle detector would receive
block/unblock messages from actors whenever the actors logically
blocked or unblocked. This resulted in a large amount of overhead
for cycle detection due the large amount of block/unblock messages
that were repeatedly sent from an actor.
This commit changes things so that actors do not send block
messages to the cycle detector until the cycle detector asks an
actor if it is blocked or not. The logic works as follows:
is aware of the new actor.
(same as before) but it no longer sends a block message to the
cycle detector immediately.
create a timer for running the cycle detection check.
knows of if it is blocked or not and then checks for any cycles
and does the normal conf/ack processing.
the cycle detector asking if it is blocked, the actor sends a
block message to the cycle detector and makes note of that fact
that it sent a block message in a flag.
sends an unblock message to the cycle detector. Same if the
actor gets a GC acquire or release message.
already sent a block (and didn't send an unblock already).
(created, block, unblock, ack, etc), it processes them
immediately to update it's viewmap and adds actors for deferred
cycle detection checking when the next time the timer is
triggered. This allows for any actors that might have received
new work inbetween blocking and cycle detection to send an
unblock message (which would remove the actor from the viewmap
deferred status) and so would not be considered as part of
cycle detection (making cycle detection more efficient).
The end result is that there is almost no overhead to cycle
detection with this new approach.
Additional changes:
--ponycdmin
,--ponycdmax
, and--ponycdconf
parametershave been removed because they don't make sense any longer with
this new approach. They have been replaced with
--ponycdinterval
to control how often the cycle detectorruns it's detection (the interval determines how often the
timer for detection fires).
There are currently two down sides to this new approach:
registered with the cycle detector. Previously, actors were
only registered with the cycle detector if they blocked. My
naive understanding is that this is not a big deal because
almost every actor would block at some point in a pony program
as that is the only way to reach quiescence but I could be
wrong about that.
ponyint_destroy
would now definitely result ina program crash (unless
--ponynoblock
were used) due to thecycle detector having a reference to every actor. This
could be avoided if
ponyint_destroy
could callponyint_cycle_actor_destroyed
but unfortunatelyponyint_destroy
does not have actx
available as is requiredfor
ponyint_cycle_actor_destroyed
.