-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in cycle checker(?) #4193
Comments
@derrickturk have you had a chance to investigate this at all? I know from zulip you were going to look into it some. If yes, any findings? If no, do you need assistance? |
@derrickturk i believe we discussed some about this in zulip, can you bring over the relevant discussion points to this issue so they can help others in the future if anyone else tries to tackle this? |
Sharing my notes here from last month after a long delay. I spent some time back in October poking at this after discussing some ideas on Zulip. I've been testing this all with a debug build of
Executions of my test program (same as above) are ending in a segmentation fault most of the time, with the full expected output. Sometimes I get the full expected output with no segmentation fault; every once in a while I see no final output and no segmentation fault. Segmentation faults appear in the tracing functions invoked via
With Following a suggestion on Zulip from @SeanTAllen, I looked into the possibility of an issue with "orphaned" actors. Setting a breakpoint/watch on I'm interested in helping find the underlying issue with the cycle-checker, but I'm also as interested or more in understanding why I see the early-termination behavior. It's possible I've misunderstood the quiescence rules. I am going to try to reduce my example program to something more minimal. |
Well, I've reduced the program to reproduce this bug, or at least I think I have. The specific manifestation of the segmentation fault varied as I made changes. The current single-file test program is at the end of this post. The current behavior is:
The last backtrace I captured is:
At one point while reducing the code, when I still had an
I have not seen the original segfault in the trace function for At this point I am still wondering if this might be related to #1118, as well as how many distinct bugs I am seeing.
|
I should also add that |
@derrickturk can you open an issue for "early termination", that sounds like a distinctly different problem for the probably cycle detector related segfault. I haven't paid a lot of attention to what you have reported for it, a good fresh summation on its own would be helpful. |
I have a hard time reproducing the segfault issue using the last example. I have a very easy time reproducing the "early termination" issue. I almost never get output. I have a much easier time reproducing all three cases with the original example in the zip file. |
Using the code on my @derrickturk very much, please do open a new issue for the termination problem and limit the information there-in to it. |
So the "early termination" problem: there's no printing going on as at the time the program ends, the waiting count is still rather high. And it is "all over the place". Here's some output from my modified version. Next up is to determine if bug in pony code or the runtime. I'm assuming runtime but I want to rule out the pony code first. |
@redvers suggested during office hours to try running with --ponymaxthreads=1 and lo and behold, no "early termination". Why? Still unknown. |
OK so as stated, the segfault is fixed by forthcoming changes. The "early termination" is a "bug" in the program. The program works with a single ponythread because the message ordering with a single thread and how things are scheduled matches the sequential thinking behind the code. However, with more than 1 thread, the code is no longer sequential and invariants that the code doesn't state do not hold. In particular there is an expectation that:
However this will only be true with a single thread where you happen to have our scheduling rules (which with a single thread will satisfy most expectations for sequential program order). However, with more than 1 thread, things can get "out of order" from what the program seems to expect to happen. In particular, at the moment that:
is first called, our first cpu starts its "steps processing" almost immediately. With a single thread this is not the case and everything is "ordered as expected". When the first cpu "starts its processing", then there are orderings where _wait can be called to add messages to the cpu set, and the cpu in question will already reached step 3 before it has received and processed the "subscribe_halt" message. This means that there will no longer be a 1 to 1 wait to done _ratio. What happens then is that when the cpu is at step 3, the on_halt matches fails because _on_halt is still None and no _done message is sent which means that this check will not pass: be _done(cpu: Cpu tag) =>
_waiting.unset(cpu)
if _wait and (_waiting.size() == 0) then
_when_done()
end because _waiting.size() will never be able to reach 0. For the program to work as expected the following that only happens with 1 thread MUST happen: within the
then we have: for cpu in cpus.values() do
cpu.run()
end which means that while main is still running...
because of this ordering and corresponding orderings based on the scheduler implementation, there can never be the mismatch in set/unset calls in the waiter. however, in a concurrent world, if we have 2 threads... the moment an actor hits scheduler 1's queue, it can be stolen by scheduler 2 and then, there are so many unknown orderings and all the race conditions in the code related to set/unset in the CpuWaiter can happen. Note that with the deterministic scheduling with 1 thread, that 4000 times "_on_halt" won't be set, but that is ok, because it means that 1000 times it will be set and there's a corresponding 1000 wait calls that were made from main. The easiest fix for the problem is to move the setting of on_halt to before the first time it needs to be used, so here in main: try
cpus(cpus.size() - 1)?.subscribe(mk)
waiter.wait(cpus(cpus.size() - 1)?)
end you have access to the waiter and can do the subscribe_halt call on the last item in cpus at that time and give it the waiter object that is in scope. |
#4251 fixes the segfault issue. |
The Pony runtime includes an optional cycle detector that is on by default. The cycle detector runs and looks for groups of blocked actors that will have reference counts above 0 but are unable to do any more work as all members are blocked and don't have additional work to do. Over time, we have made a number of changes to the cycle detector to improve it's performance and mitigate it's impact on running Pony programs. In the process of improving the cycle detectors performance, it has become more and more complicated. That complication led to several race conditions that existed in the interaction between actors and the cycle detector. Each of these race conditions could lead to an actor getting freed more than once, causing an application crash or an attempt to access an actor after it had been deleted. I've identified and fixed what I believe are all the existing race conditions in the current design. I intend to follow this commit up in the future with a completely new design that will provide the same functionality as the cycle detector but with better performance and maintenance characteristics. These changes were all tested with the "short lived actors" cycle detector programs exhibit similar memory usage and none of them nor other test programs I threw at them caused any assertion failures or crashes. Closes #4193 Closes #4221 Closes #4220 Closes #4219
Opening apology: this is a much-reduced version of a solution to Advent of Code's 2019 day 7 puzzle, which is hard to explain in a vacuum but comes down to implementing a tiny VM with async I/O and running multiple instances of it in communication with each other. After cutting out a lot of extraneous things, I'm left with the attached source files and input. I'm reluctant to reduce it further because I'm starting to think the crash has nothing to do with the VM logic itself, but it's challenging to produce relevant input for a cut-down VM.
My notes refer to testing on both Windows 10 and Arch Linux via WSL2. In both cases ponyc is the latest release 0.51.2. This is a 4-core machine.
Run with:
./intpony input.txt
The three observed behaviors are:
The only runtime option I've found to have any effect on this is
--ponymaxthreads 1
, which on Windows seemingly guarantees the intended output (with fast exit).Compiled with
-d
, I get additional possible outcomes including mismatches between the count of RUN outputs and DONE outputs. (This should not happen given the input - each VM/Cpu should run to successful halt.)The program ends in a segfault, usually, on either the release or debug binary. It's often reported (by gdb) in
pony_os_peername
on Windows, and inArray_I64_val_Trace
on WSL2. Oddly, runs with no segfault also have no output, and successful runs produce output before segfaulting. With--ponymaxthreads 1
, no segfault on Windows, but I still get segfaults on WSL2.The test program creates 100
Cpu
actors total, each with a 519-"word" memory (i.e. anArray[I64]
with 519 entries). This can be adjusted; it seems that segfaults get more likely as the number goes up. I've never seen a segfault with only 1 or 2 actors, but I have with 4 or 5.Full stack trace from a crash on WSL2:
I've also seen:
The plot thickened with a suggestion on Zulip to run with
--ponynoblock
, disabling the cycle checker (IIUC). This resulted in "dropped output" about 50-75% of the time (rate maybe dependent on running under a debugger or not), but no segfaults.intpony.zip
The text was updated successfully, but these errors were encountered: