-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open J9 crash #11569
Comments
@0xdaryl fyi |
@a7ehuo : would you mind triaging this problem please? Some of the crash artifacts are available as an attachment on the bugzilla issue above. @gpezzini : was there a larger Windows crash dump file produced (*.dmp) and is it still available? If so, could you either transfer it via Slack (you can message it to me (0xdaryl) on the OpenJ9 workspace) or your favourite file sharing service (e.g., Google Drive, Box, etc.)? |
Hi, here the dump file |
@gpezzini is there any other diagnostics? There is usually a "javacore." and a "jitdump." file created as well. These names are printed to stderr when the crash happen. They are almost always in the same directory as the core file (core.20201223.085735.16160.0001.dmp). The "jitdump.*" file in particular could be very useful to aid in the investigation here. |
Hi, you will find all files that I had found in: |
Please, let me know if it's enough. |
@a7ehuo see the bugzilla link. There is a
The jitdump recompilation has reproduced the issue and has also enabled Given we know the exact trees which caused the problem I bet you can force a compilation of that method in a unit test and reproduce the issue with some extra JIT options to force the same inlining. Also @vijaysun-omr FYI another success for the jitdump work! |
Great to see @fjeremic I did want to just mention @klangman and @JamesKingdon to make them aware that Filip has been improving the JIT dump functionality in recent months and that you may be able to reap the benefit of that work on the service streams as well. |
Thanks @fjeremic for the quick assessment! I was looking at the attached files. There is no core file. The jitdump recompilation stopped in
The crashed compilation thread stack
|
@a7ehuo the windows stack trace above is not accurate; it's because the symbols are in a different place and I guess the javacore writer doesn't know where to grab the symbols from... If you want the real stack trace, given that you have no core file, you'll likely have to manually go through the exe using something like DUMPBIN (similar to |
See #11569 (comment) for the core file. |
Got the backtrace from the core. It crashed while executing
|
I’m trying to match the crashed compilation locally, but not able to match the exact nested inlining yet. The call flow is that
|
Maybe those paths that are not being inlined on in your local test don't cross some frequency threshold that makes them candidates for inlining. I wonder if you would get the inlining you want if you relaxed some of the frequency based thresholds that inliner uses (note this could inline a lot more and have inlining differences in general, but it may at least be worth a shot en route to forcing the specific inlining that you want). @ashu-mehra has tried the long set of options to avoid considering frequency in the inliner recently and @mpirvu might also be able to share that options string. |
"disableConservativeColdInlining,disableConservativeInlining,bigCalleeThreshold=600,bigCalleeHotOptThreshold=600,bigCalleeScorchingOptThreshold=600,inlineVeryLargeCompiledMethods" |
Thanks @mpirvu @vijaysun-omr ! I'll give that the option a try. |
Hi, consider which I can reproduce the problem always. |
@gpezzini, could you help try the following? Thanks!
|
@a7ehuo I will. I'll back to you when I will have the results. |
I tried the options from #11569 (comment) along with all the methods in |
I'm performing the test 1. In the meanwhile I've download the build u suggest to me, but avast says which the file is infected Please see: This does not happens using jdk-11.0.9+11_openj9-0.23.0 Right now I do not know if I can continue or not, |
I found the nightly build on the AdoptOpenJDk nightly build page: https://adoptopenjdk.net/nightly.html?variant=openjdk11&jvmVariant=openj9 and chose the build on Jan 7th which is the one I tried locally. It's likely a false hit. I'll open an issue in AdoptOpenJDK support to track it. |
What infection was detected? I used to have problems with anti-virus on Windows, but the detected problem seemed to be just that the binary wasn't recognized, which makes sense for a nightly build. |
Still looking, since _actualOptSetInfo[ |
@gpezzini I asked the question in AdoptOpenJDK slack channel. Here is the answer copied from the reply in case you don't have access to it:
|
Test 1. Test 2. |
Hi, ended test 2, without “-Xjit:disableInlining” and dowloaded build: The job ends correctly |
@gpezzini Thanks for helping try the two things! To work around this issue while we're investigating it, either you could continue using the above nightly build, or add |
@a7ehuo Thanks a lot for the your support!! |
With the help from @vijaysun-omr, we found the problem. A potential fix is being tested. @vijaysun-omr found the jitdmp shows expression
In TR::Node::recreate(node, _compilation->il.opCodeForCorrespondingLoadOrStore(node->getOpCodeValue()));
if (node->getOpCode().isStoreIndirect())
{
node->setNumChildren(1);
}
else
{
node->setNumChildren(0); Because the check on whether not the node is an indirect store happens after the awrtbari node is recreated as aloadi, FYI @gpezzini |
Very good find @a7ehuo & @vijaysun-omr 👏 |
In `hasOldExpressionOnRhs()`, we temporarily change wrtbar to aloadi for the syntactic comparison in `areSyntacticallyEquivalent()`. When it’s an indirect store, the number of children of the node is set to 1, otherwise 0. Because the check on whether not the node is an indirect store happens after the wrtbar node is recreated as aloadi, `node->getOpCode().isStoreIndirect()` is false for aloadi, and the number of children for wrtbar node ends up as 0. It prevents `areSyntacticallyEquivalent()` from comparing the first child of the two nodes. It concludes that different expressions as the same. The fix is to check if it’s an indirect store from the original node. Fixes eclipse-openj9/openj9#11569 Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
Thanks @fjeremic I was pleasantly surprised to see the JIT dump show tracePRE output in addition to traceFull and this was invaluable in this case for detecting what the problem was. How is the decision taken on which optimization to trace in more depth (e.g. tracePRE) for a given crash (I am assuming this is done automatically when the JIT dump compilation is done) ? I asked @a7ehuo to look into this aspect as well but I thought I'd ask since you were following and commented on this issue. |
Attn : @klangman @JamesKingdon and @0xdaryl for this type of a crash in PRE since I won't be surprised if there are duplicates from the change that apparently regressed this behavior in Aug 2020. You may want to make a note of it from a service viewpoint. |
In `hasOldExpressionOnRhs()`, we temporarily change wrtbar to aloadi for the syntactic comparison in `areSyntacticallyEquivalent()`. When it’s an indirect store, the number of children of the node is set to 1, otherwise 0. Because the check on whether not the node is an indirect store happens after the wrtbar node is recreated as aloadi, `node->getOpCode().isStoreIndirect()` is false for aloadi, and the number of children for wrtbar node ends up as 0. It prevents `areSyntacticallyEquivalent()` from comparing the first child of the two nodes. It concludes that different expressions as the same. The fix is to check if it’s an indirect store from the original node. Fixes eclipse-openj9/openj9#11569 Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
@vijaysun-omr Do we expect it to always show-up as a crash in the TR_ExceptionCheckMotion::perform(void)? |
The jitdump's had this ability for a while; a crash in an opt will trigger tracing for that opt: @fjeremic also has a new PR that cleans up this up further (#11610) |
@klangman it is possible for other symptoms, such as a wrong field value being privatized for example. |
To give an example of the kind of run time problem that could be caused by this bug:
load o1.f could be commoned up when o1 != o2
store o1.f = rhs could be wrongly copy propagated, i.e. the load of o2.f could be changed to pick up the rhs value from the earlier store (either via a temp or a register) even in the case when o1 != o2 |
In `hasOldExpressionOnRhs()`, we temporarily change wrtbar to aloadi for the syntactic comparison in `areSyntacticallyEquivalent()`. When it’s an indirect store, the number of children of the node is set to 1, otherwise 0. Because the check on whether not the node is an indirect store happens after the wrtbar node is recreated as aloadi, `node->getOpCode().isStoreIndirect()` is false for aloadi, and the number of children for wrtbar node ends up as 0. It prevents `areSyntacticallyEquivalent()` from comparing the first child of the two nodes. It concludes that different expressions as the same. The fix is to check if it’s an indirect store from the original node. Fixes eclipse-openj9/openj9#11569 Signed-off-by: Annabelle Huo <Annabelle.Huo@ibm.com>
Hi, I've downloaded: openjdk version "11.0.10" 2021-01-19 And I had run the process, but I have the jvm crash again. Note which the following 'beta' version suggested by @a7ehuo (see the comment in this thread on Jan 8) and without the “-Xjit:disableInlining” and "-Xjit:disablePRE" options This was the working version: OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+8-202101062342) Do you want the dump files? |
@gpezzini this issue is tagged for the 0.25 release (Java 16 only), but will also be in the 0.26 release in April that updates Java 11. The fix is not included in the 0.24 release which is used by 11.0.10 |
@pshipton |
I've opened this bug at Eclipse, because eclipse crashes during perform of a custom process:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=569886
They had said to me to open also a bug report here because 'it seems a open j9 crash.
They had reported in their answer:
To me it looks like this is a crash in OpenJ9, please report this crash at
OpenJ9 too [1] and add a reference to that bug here.
[1] https://github.com/eclipse/openj9/issues.
1XHEXCPCODE Windows_ExceptionCode: C0000005
1XHEXCPCODE J9Generic_Signal: 00000004
1XHEXCPCODE ExceptionAddress: 00007FF8809F0FBA
1XHEXCPCODE ContextFlags: 0010005F
1XHEXCPCODE Handler1: 00007FF880510D80
1XHEXCPCODE Handler2: 00007FF8B244AC10
1XHEXCPCODE InaccessibleWriteAddress: 00007FF40000002E
NULL
1XHEXCPMODULE Module: E:\AdoptOpenJdk11\bin\compressedrefs\j9jit29.dll
1XHEXCPMODULE Module_base_address: 00007FF880480000 1XHEXCPMODULE Offset_in_DLL: 0000000000570FBA
1XMCURTHDINFO Current thread
3XMTHREADINFO "JIT Compilation Thread-001" J9VMThread:0x0000000002C91200,
omrthread_t:0x00000000186FA4F0, java/lang/Thread:0x0000000500450BA0, state:R,
prio=10
The text was updated successfully, but these errors were encountered: