-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutant causes MRI to get stuck #265
Comments
@plexus Very good reproduction. I'll see what I can do later this day after finishing commercial activities. |
@plexus I got it reproduced and fixed in my actors branch. Need to clean up lots of ugly code I wrote there (and maybe use an external actor implementation, not sure yet). Just a heads up from me. |
is that branch on GH? not seeing it after a fetch... |
I can push it. But do not comment it etc its WIP (I do not push code that does not pass all specs yet. Only when I have collabs I do so). |
My main reason is not to trigger reviews by others when I know there are still issues. This is experimental and not done with tests up-front ;) |
If you've seen a bit of my code you know I'm not the most critical when it comes to these things :) anyway hope it works, need to up the coverage of Yaks again! |
@plexus you can always use an older version of mutant where parallel nesting worked through an older parallel dependency. |
@plexus |
ok, will try older versions |
@plexus you could also exclude the problematic subject. |
@plexus I'd love you confirm the problem is solved to release |
Alas... now it runs up to
then hangs, when I Ctrl-C I get
|
@plexus I count this as another improvement over the initial behavior.
|
Running the single subject on master reproduces the behavior, on actors that one now finishes fine. When I add a timeout it also finishes fine on master. So that's good enough for me! Just have to see if it makes it through a full run but seems it will be fine. |
Mhh. Master and Actors should be semantically equivalent in terms of termination behavior. When adding a timeout fixes master, but actors was not even affected there are some effects I oversaw. BTW: Do you notice a small (around 15%) speedup when using recent master, with hand rolling isolation rather than using parallel I saw a reduction of auom coverage time from 34s to 25s. When your full run passes I'm gonna release 0.6.4 from master anyway. As its a improvement over 0.6.3. |
Now it's a different subject that gets stuck, timeout isn't helping. On master it just gets stuck like before
On actors it also gets stuck but the Runtime keeps going up
|
@plexus Actors decouples killing from reporting. For that reason the report timer still triggers nicely in such cases. We'll need worker reporting to I plan for actors where we know which worker is doing what right now. What is your timeout set to? Rubies timers are not very deterministic, I suggest to wait 2-10x longer to be sure the timeout has a chance to get a rid of the mutation. |
I first had it to 0.1 seconds, but then later set it to 1 sec. With 0.01 or smaller I got segfaults :) |
hehe. I'll look into it. Can we setup a branch in |
yeah sure, I can do that |
@plexus I released |
@plexus Okay. So I did a very deep dive. Short story: Can you try Long story: We know that mutant exposes bugs in MRI, bugs that manifest in segfaults. This behavior was visible from I strac-ed / gdb-ed such a hanging killfork and will try to reproduce it outside the mutant / rspec / yaks environment. This will take some time. As a faster countermeasure we probably also need to timeout killforks from mutants parent side. Something I never wanted to build for various reasons:
I probably need to pile up much motivation to dig into that messy ecosystem to produce such a workaround / fix. |
Does this mean you reproduced the segfaults outside mutant already? Can you create me a branch from that state so I can begin reducing it? |
@plexus I reduced it to this short one, crashes most of recent MRI with a segfault. I could not reduce the class Foo
include Enumerable
def initialize(items)
@items = items
end
def each(&block)
@items.__send__(:each, &block)
rescue Exception
end
def more
to_a # or any other method from Enumerable.
[self]
end
end
# Intentional infinite recursion to trigger the bug
# This is not code I'd write. But its code ruby should not segfault on.
def call(resource)
resource.more.each(&method(__method__))
end
# Removing the thread results in *correct* behavior (stack level to deep error).
Thread.new do
call(Foo.new([]))
end.join |
@plexus Will you report this one upstream? You had good luck the last time we reduced one from mutant ;) |
How about setting smaller timeouts? On latest trunk it doesn't crash with 0.1 for me but it does with 0.001 for example. |
@plexus My reduction does not involve timeouts. I think it might be the kernel version that produces other scheduling patterns. What kernel you are on? I can downgrade to see of the race (I expect its one) does not occur here. In this case I think my reproduction will help upstream to fix early. |
I think I'll just write a fuzzer interleaving threads with timeouts in random ways to get a better picture. |
This will be fun with unparser assistance ;) |
kernel: |
@mbj Oops, sorry, misread that as addressed to me, somehow. |
@plexus Just had luck reproducing it on an ubuntu VM: class Foo
include Enumerable
def each(&block)
[].__send__(:each, &block)
rescue Exception
end
def more
to_a # any method from enumerable
[self]
end
end
def call(resource)
resource.more.each(&method(__method__))
end
Thread.new do
Kernel.catch(Exception.new) do
call(Foo.new)
end
end.join Can you confirm it also crashes on your machine? I think it would be reportable to upstream than. |
Using the CrowdCI I got confirmations about segfaults / hangs, but not on all systems: https://gist.github.com/mbj/31163a8e712573877268 |
@plexus So some people can reproduce the segfault. IMO its reportable to upstream with this findings. What do you think? |
Yup, can confirm that it crashes on 2.1.4 and trunk. I'll report it. |
@plexus Nice. I'm working in mutant side timeouts of killforks within the actors branch. In the hope this timeouts do NOT trigger the bug we are trying to workaround :P |
Ruby lang issue regarding the segfault: https://bugs.ruby-lang.org/issues/10460 |
@plexus Thx. |
I added a brief summary about this to mutants README, as it will probably take long to resolve this bug. |
I just tried with freshly released MRI 2.1.5 and it still segfaults. |
I think upstream issue https://bugs.ruby-lang.org/issues/10626 is related to this one. I saw the same symptoms but could never isolate them in a stable way. |
@plexus This problem was probably solved with: ruby/ruby@8fe95fe can you confirm? |
Yup works fine now, great news! |
@plexus Yeah. I hope MRI people cut a release with that one soon. So I can continue mutant development on a more deterministic environment. I'll close this issue now. |
This issue is cited a lot, hence years after it was closed let me point interested people to configurable coverage criteria mutant added in the latest release: https://github.com/mbj/mutant/blob/master/docs/configuration.md#coverage_criteria |
Seeing this consistently on current Yaks.
Steps to reproduce
Ruby version
Gemfile.lock: https://gist.github.com/ae2fab52a0999a3eeea7
It runs almost to the end, but then gets stuck at
The text was updated successfully, but these errors were encountered: