Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk11: jtreg java/util/concurrent/ArrayBlockingQueue/WhiteBox.java crashes on MacOS #5988

Closed
andrew-m-leonard opened this issue Jun 3, 2019 · 23 comments · Fixed by #7104
Closed
Labels
AQA Test failures from AdoptOpenJDK Quality Assurance test set comp:vm test failure

Comments

@andrew-m-leonard
Copy link
Contributor

Failure link

https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder_Advanced/10/console

If the link is not public, instead include

  • test category (functionality, system, etc): openjdk
  • test suite name: java/util/concurrent/ArrayBlockingQueue/WhiteBox.java
  • OS/architecture (i.e. Ubuntu 16.04 x64): MacOS
  • OpenJDK version (i.e. 8u202-b08): jdk-11.0.3+7_openj9-0.14.0 & 3
  • public build SHAs (i.e. -version output):
    12:52:16 openjdk version "11.0.3" 2019-04-16
    12:52:16 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
    12:52:16 Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.14.3, JRE 11 Mac OS X amd64-64-Bit Compressed References 20190531_242 (JIT enabled, AOT enabled)
    12:52:16 OpenJ9 - b8ab016
    12:52:16 OMR - b56045d2
    12:52:16 JCL - dcdc97f9dc based on jdk-11.0.3+7)

Optional info

  • intermittent failure (yes|no): no, but failure different on different Machines
  • regression or new test: regression
  • if regression, what are the last passing / first failing public SHAs (OpenJ9, OMR, JCL) :

Failure output

12:52:23  JVM_OPTIONS: -Xcompressedrefs 
12:52:23  { itercnt=1; \
12:52:23  	mkdir -p "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15595627327379/jdk_custom_0"; \
12:52:23  	cd "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15595627327379/jdk_custom_0"; \
12:52:23  	"/Users/jenkins/workspace/Grinder_Advanced/openjdkbinary/j2sdk-image/bin/java" -Xmx512m -jar "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/openjdk/jtreg/lib/jtreg.jar" \
12:52:23  	-agentvm -a -ea -esa -v:fail,error,time,nopass -retain:fail,error -ignore:quiet -timeoutFactor:8 -xml:verify -concurrency:4 -vmoptions:"-Xmx512m -Xcompressedrefs " \
12:52:23  	-w ""/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15595627327379/jdk_custom_0"/work" \
12:52:23  	-r "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../../jvmtest/openjdk/report" \
12:52:23  	-jdk:"/Users/jenkins/workspace/Grinder_Advanced/openjdkbinary/j2sdk-image" \
12:52:23  	"/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../openjdk/openjdk-jdk/test/jdk/java/util/concurrent/ArrayBlockingQueue/WhiteBox.java"; \
12:52:23  	if [ $? -eq 0 ] ; then echo ""; echo "jdk_custom_0""_PASSED"; echo ""; cd /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../..;  else echo ""; echo "jdk_custom_0""_FAILED"; echo ""; fi; } 2>&1 | tee -a "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15595627327379/TestTargetResult";
12:52:27  Directory "/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/scripts/testKitGen/../../../TestConfig/test_output_15595627327379/jdk_custom_0/work" not found: creating
12:52:27  /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/openjdk/openjdk-jdk/test/jdk/TEST.groups: group jdk_security3: file not found: lib/security
12:52:39  XML output with verification to /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work
12:52:45  [2019-06-03 07:52:44,825] Agent[1]: stderr: Unhandled exception
12:52:45  [2019-06-03 07:52:44,825] Agent[1]: stderr: Type=Segmentation error vmState=0x00000000
12:52:45  [2019-06-03 07:52:44,825] Agent[1]: stderr: J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
12:52:45  [2019-06-03 07:52:44,825] Agent[1]: stderr: Handler1=000000000371D220 Handler2=000000000393D750 InaccessibleAddress=00007FDAF903308C
12:52:45  [2019-06-03 07:52:44,825] Agent[1]: stderr: RDI=00007F92F7509170 RSI=00007F92F900DC68 RAX=0000004800000005 RBX=00007F92F9033068
12:52:45  [2019-06-03 07:52:44,826] Agent[1]: stderr: RCX=00007F92F900DC68 RDX=00007FDAF903306C R8=0000700003DD91B8 R9=FFFFFFFF00000000
12:52:45  [2019-06-03 07:52:44,826] Agent[1]: stderr: R10=00007F92F780AB78 R11=00007F92F900DC68 R12=0000000000000000 R13=00007FDAF903306C
12:52:45  [2019-06-03 07:52:44,826] Agent[1]: stderr: R14=00007F92F900DC68 R15=00007F92F7509170
12:52:45  [2019-06-03 07:52:44,826] Agent[1]: stderr: RIP=000000000374BD24 GS=0000 FS=0000 RSP=0000700003DD8EA0
12:52:45  [2019-06-03 07:52:44,826] Agent[1]: stderr: RFlags=0000000000010206 CS=002B RBP=0000700003DD8EA0 ERR=F903308C00000004
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: TRAPNO=000000040000000E CPU=308C000000040000 FAULTVADDR=00007FDAF903308C
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: XMM2 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: XMM3 bf455018a8000000 (f: 2818572288.000000, d: -6.504174e-04)
12:52:45  [2019-06-03 07:52:44,827] Agent[1]: stderr: XMM4 be8c66f8d759984b (f: 3612973056.000000, d: -2.116131e-07)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM5 bfb2aa0400000000 (f: 0.000000, d: -7.290673e-02)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM6 000fffffffffffff (f: 4294967296.000000, d: 2.225074e-308)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM7 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,828] Agent[1]: stderr: XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,829] Agent[1]: stderr: XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,829] Agent[1]: stderr: XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,829] Agent[1]: stderr: XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,829] Agent[1]: stderr: XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:52:45  [2019-06-03 07:52:44,829] Agent[1]: stderr: Module=/Users/jenkins/workspace/Grinder_Advanced/openjdkbinary/j2sdk-image/lib/compressedrefs/libj9vm29.dylib
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: Module_base_address=000000000367B000 Symbol=segmentInsertionComparator
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: Symbol_address=000000000374BD20
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: Target=2_90_20190531_242 (Mac OS X 10.13.6)
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: CPU=amd64 (4 logical CPUs) (0x200000000 RAM)
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: ----------- Stack Backtrace -----------
12:52:45  [2019-06-03 07:52:44,830] Agent[1]: stderr: ---------------------------------------
12:52:45  [2019-06-03 07:52:44,831] Agent[1]: stderr: JVMDUMP039I Processing dump event "gpf", detail "" at 2019/06/03 07:52:44 - please wait.
12:52:45  [2019-06-03 07:52:44,831] Agent[1]: stderr: JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/core.20190603.075244.27000.0001.dmp' in response to an event
12:52:51  [2019-06-03 07:52:50,352] Agent[1]: stderr: JVMDUMP010I System dump written to /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/core.20190603.075244.27000.0001.dmp
12:52:51  [2019-06-03 07:52:50,353] Agent[1]: stderr: JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/javacore.20190603.075244.27000.0002.txt' in response to an event
12:52:51  [2019-06-03 07:52:50,375] Agent[1]: stderr: JVMDUMP012E Error in Java dump: /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/javacore.20190603.075244.27000.0002.txt
12:52:51  [2019-06-03 07:52:50,375] Agent[1]: stderr: JVMDUMP032I JVM requested Snap dump using '/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/Snap.20190603.075244.27000.0003.trc' in response to an event
12:52:51  [2019-06-03 07:52:50,476] Agent[1]: stderr: JVMDUMP010I Snap dump written to /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/Snap.20190603.075244.27000.0003.trc
12:52:51  [2019-06-03 07:52:50,477] Agent[1]: stderr: JVMDUMP007I JVM Requesting JIT dump using '/Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/jitdump.20190603.075244.27000.0004.dmp'
12:52:51  [2019-06-03 07:52:50,529] Agent[1]: stderr: JVMDUMP010I JIT dump written to /Users/jenkins/workspace/Grinder_Advanced/openjdk-tests/TestConfig/test_output_15595627327379/jdk_custom_0/work/scratch/0/jitdump.20190603.075244.27000.0004.dmp
12:52:51  [2019-06-03 07:52:50,529] Agent[1]: stderr: JVMDUMP013I Processed dump event "gpf", detail "".
12:52:51  --------------------------------------------------
12:52:51  TEST: java/util/concurrent/ArrayBlockingQueue/WhiteBox.java
@DanHeidinga
Copy link
Member

@andrew-m-leonard I see a couple of different crashes in the Jenkins output. Are you able to capture the javacores / system cores / etc from the crashes to aid the investigations?

@andrew-m-leonard
Copy link
Contributor Author

13:26:32 [2019-06-03 08:26:32,170] Agent[1]: stderr: 12:26:32.170 0x8913300 omrport.359 * ** ASSERTION FAILED ** at ../../omr/port/common/omrmemtag.c:145: ((memoryCorruptionDetected))

@andrew-m-leonard
Copy link
Contributor Author

@DanHeidinga I'm not aware of a way of getting hyc-runtimes dumps? @AdamBrousseau does the worspace get cleared?

@JasonFengJ9
Copy link
Member

This might be the same issue reported at #5399, particularly #5399 (comment).

@smlambert
Copy link
Contributor

Dumps are in the archived openjdk_test_output.tar.gz file uploaded to artifactory.

Screen Shot 2019-06-03 at 10 24 11 AM

@DanHeidinga
Copy link
Member

Similar issue on 0.14.0 release:
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder_Advanced/13/console

Assertion failure:

13:26:32  [2019-06-03 08:26:32,170] Agent[1]: stderr: 12:26:32.170 0x8913300 omrport.359    *   ** ASSERTION FAILED ** at ../../omr/port/common/omrmemtag.c:145: ((memoryCorruptionDetected))
$ lldb jdk-11.0.3+7/Contents/Home/bin/java -c core.20190603.082632.59714.0001.dmp 
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff97448286 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff952699f9 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x0000000006134ee3 libj9prt29.dylib`omrdump_create + 275
    frame #3: 0x0000000006180a79 libj9dmp29.dylib`doSystemDump + 297
    frame #4: 0x00000000061855d5 libj9dmp29.dylib`protectedDumpFunction + 21
    frame #5: 0x0000000006136267 libj9prt29.dylib`omrsig_protect + 983
    frame #6: 0x00000000061851d6 libj9dmp29.dylib`runDumpAgent + 790
    frame #7: 0x000000000619b0d1 libj9dmp29.dylib`triggerDumpAgents + 1393
    frame #8: 0x0000000006d412d9 libj9trc29.dylib`triggerHit + 761
    frame #9: 0x0000000006d356fa libj9trc29.dylib`doTracePoint + 3626
    frame #10: 0x0000000006d314eb libj9trc29.dylib`javaTrace + 203
    frame #11: 0x0000000006133d84 libj9prt29.dylib`unwrapBlockAndCheckTags + 292
    frame #12: 0x0000000006133c15 libj9prt29.dylib`omrmem_free_memory + 85
    frame #13: 0x0000000006dd8d54 libj9jit29.dylib`std::__1::deque<TR::reference_wrapper<J9MemorySegment>, TR::typed_allocator<TR::reference_wrapper<J9MemorySegment>, J9::RawAllocator> >::erase(std::__1::__deque_iterator<TR::reference_wrapper<J9MemorySegment>, TR::reference_wrapper<J9MemorySegment> const*, TR::reference_wrapper<J9MemorySegment> const&, TR::reference_wrapper<J9MemorySegment> const* const*, long, 512l>) + 1076
    frame #14: 0x0000000006dd802e libj9jit29.dylib`J9::SystemSegmentProvider::~SystemSegmentProvider() + 190
    frame #15: 0x0000000006db0869 libj9jit29.dylib`TR::CompilationInfoPerThread::shouldPerformCompilation(TR_MethodToBeCompiled&) + 1593

@DanHeidinga
Copy link
Member

DanHeidinga commented Jun 3, 2019

This might be the same issue reported at #5399, particularly #5399 (comment).

In the cores from the 0.14.0 grinder, there are a couple of occurrences of #5399.
One of the issue in the comment above.

And another with a crash in segmentInsertionComparator:

    frame #13: 0x00007fff9b7adf1a libsystem_platform.dylib`_sigtramp + 26
    frame #14: 0x000000000ced0a45 libj9vm29.dylib`segmentInsertionComparator + 5
    frame #15: 0x000000000cfe65f2 libj9vm29.dylib`insertNode + 194
    frame #16: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
    frame #17: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
    frame #18: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
    frame #19: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
    frame #20: 0x000000000cfe6621 libj9vm29.dylib`insertNode + 241
    frame #21: 0x000000000cfe651d libj9vm29.dylib`avl_insert + 29
    frame #22: 0x000000000ced088f libj9vm29.dylib`allocateVirtualMemorySegmentInListInternal + 559
    frame #23: 0x000000000cecffe3 libj9vm29.dylib`allocateMemorySegmentInList + 147
    frame #24: 0x000000000ce91f7d libj9vm29.dylib`allocateClassMemorySegment + 205
    frame #25: 0x000000000ce9b752 libj9vm29.dylib`internalCreateRAMClassFromROMClassImpl(J9VMThread*, J9ClassLoader*, J9ROMClass*, unsigned long, J9Class*, J9ROMMethod**, long, int, J9Class*, unsigned long, J9Class*, J9CreateRAMClassState*, J9ClassLoader*, J9Class*, J9Module*) + 9298
    frame #26: 0x000000000ce985e9 libj9vm29.dylib`internalCreateRAMClassFromROMClass + 2825
    frame #27: 0x000000000cfd2733 libj9vm29.dylib`internalDefineClass + 2883
    frame #28: 0x000000000ec8f879 libjclse29.dylib`defineClassCommon + 1897
    frame #29: 0x000000000ec823bd libjclse29.dylib`Java_java_lang_ClassLoader_defineClassImpl + 173
    frame #30: 0x00000000236d4653
    frame #31: 0x000000000ce8c48d libj9vm29.dylib`sendLoadClass + 429
    frame #32: 0x000000000ce936b6 libj9vm29.dylib`internalFindClassInModule + 5126
    frame #33: 0x000000000cecb0ca libj9vm29.dylib`resolveClassRef + 506
    frame #34: 0x000000000ce305a7 libj9vm29.dylib`VM_BytecodeInterpreter::run(J9VMThread*) + 131815
    frame #35: 0x000000000ce102b7 libj9vm29.dylib`bytecodeLoop + 151
    frame #36: 0x000000000cf70052 libj9vm29.dylib`cInterpreter + 22
    frame #37: 0x000000000ce8fe92 libj9vm29.dylib`sidecarInvokeReflectMethodImpl + 1474
    frame #38: 0x000000000ce90711 libj9vm29.dylib`sidecarInvokeReflectMethod + 129
    frame #39: 0x000000000ecbfebd libjclse29.dylib`JVM_InvokeMethod_Impl + 109
    frame #40: 0x00000000236c9c45
    frame #41: 0x000000000ce8fe92 libj9vm29.dylib`sidecarInvokeReflectMethodImpl + 1474
    frame #42: 0x000000000ce90711 libj9vm29.dylib`sidecarInvokeReflectMethod + 129
    frame #43: 0x000000000ecbfebd libjclse29.dylib`JVM_InvokeMethod_Impl + 109
    frame #44: 0x00000000236c9c45
    frame #45: 0x000000000ce8d47f libj9vm29.dylib`runJavaThread + 415
    frame #46: 0x000000000cf66367 libj9vm29.dylib`javaProtectedThreadProc + 183
    frame #47: 0x000000000c538267 libj9prt29.dylib`omrsig_protect + 983
    frame #48: 0x000000000cf66283 libj9vm29.dylib`javaThreadProc + 83
    frame #49: 0x000000000c4f80b9 libj9thr29.dylib`thread_wrapper + 361
    frame #50: 0x00007fff9526805a libsystem_pthread.dylib`_pthread_body + 131
    frame #51: 0x00007fff95267fd7 libsystem_pthread.dylib`_pthread_start + 176
    frame #52: 0x00007fff952653ed libsystem_pthread.dylib`thread_start + 13

@DanHeidinga
Copy link
Member

From the javacore: 1XHEXCPCODE InaccessibleAddress: 00007F18590ADB4C

(lldb) disassemble -n segmentInsertionComparator
libj9vm29.dylib`segmentInsertionComparator:
    0xced0a40 <+0>:  pushq  %rbp
    0xced0a41 <+1>:  movq   %rsp, %rbp
    0xced0a44 <+4>:  movq   0x20(%rdx), %rcx
    0xced0a48 <+8>:  xorl   %eax, %eax
    0xced0a4a <+10>: cmpq   %rcx, 0x20(%rsi)
    0xced0a4e <+14>: je     0xced0a60                 ; <+32>
    0xced0a50 <+16>: movl   $0x1, %ecx
    0xced0a55 <+21>: movq   $-0x1, %rax
    0xced0a5c <+28>: cmovaq %rcx, %rax
    0xced0a60 <+32>: popq   %rbp
    0xced0a61 <+33>: retq   
    0xced0a62 <+34>: nopw   %cs:(%rax,%rax)

Registers

(lldb) register read rdx rcx
     rdx = 0x00007f18590adb2c
     rcx = 0x00007fa05981a268

so rdx is the bad value.

https://github.com/eclipse/openj9/blob/1fd11ed3ef289907b5241a64bf3de46ae2e4aa81/runtime/vm/segment.c#L629-L640

Likely the walkNode.

Need to get this into ddr and see if the avl tree is corrupt

DanHeidinga added a commit to DanHeidinga/omr that referenced this issue Jun 4, 2019
While investigating a crash in eclipse-openj9/openj9#5988, I had to
look at the AVL code and there were cases where it wasn't
clear all the variables had been initialized before they
were read.  This patch makes initialization explicit and
ensures variables are initialized at their declaration site.

This patch is code cleanup to make it easier to reason about
the AVL code.

Signed-off-by: Dan Heidinga <daniel_heidinga@ca.ibm.com>
@DanHeidinga
Copy link
Member

I've opened a PR at OMR to make it easier to reason about the AVL code (eclipse-omr/omr#3944) and while it makes it easier to read the code, it won't fix this issue.

andrewrgao pushed a commit to andrewrgao/omr that referenced this issue Jun 5, 2019
While investigating a crash in eclipse-openj9/openj9#5988, I had to
look at the AVL code and there were cases where it wasn't
clear all the variables had been initialized before they
were read.  This patch makes initialization explicit and
ensures variables are initialized at their declaration site.

This patch is code cleanup to make it easier to reason about
the AVL code.

Signed-off-by: Dan Heidinga <daniel_heidinga@ca.ibm.com>
@smlambert smlambert added the AQA Test failures from AdoptOpenJDK Quality Assurance test set label Jun 14, 2019
@pshipton
Copy link
Member

It doesn't look like we'll have a solution ready for this in the 0.15 release.

@DanHeidinga
Copy link
Member

@theresa-m Can you update this with how the investigation is going?

@theresa-m
Copy link
Contributor

The investigation is ongoing. Still don't have a fix in mind.

@theresa-m
Copy link
Contributor

Hey @dmitripivkine ran into this going through one of the possible traces for this issue. This slot is being treated as J9Object in the code but clearly contains UTF8 string data. Is that normal behavior?

(lldb) up
frame #16: 0x0000000005a89d24 libj9gc29.dylib`MM_MarkingSchemeRootClearer::doStringTableSlot(this=0x000070000cc58c68, slotPtr=0x6176616a4c280028, stringTableIterator=0x000070000cc58b28) at MarkingSchemeRootClearer.cpp:353
   350 	MM_MarkingSchemeRootClearer::doStringTableSlot(omrobjectptr_t *slotPtr, GC_StringTableIterator *stringTableIterator)
   351 	{
   352 		_env->getGCEnvironment()->_markJavaStats._stringConstantsCandidates += 1;
-> 353 		if(!_markingScheme->isMarked(*slotPtr)) {
   354 			_env->getGCEnvironment()->_markJavaStats._stringConstantsCleared += 1;
   355 			stringTableIterator->removeSlot();
   356 		}
(lldb) p *(GC_StringTableIterator*)0x000070000cc58b28
(GC_StringTableIterator) $0 = {
  _hashTable = 0x00007fc5a34194f0
  _handle = {
    table = 0x00007fc5a34194f0
    bucketIndex = 432
    didDeleteCurrentNode = 0
    pointerToCurrentNode = 0x00007fc5a5072388
    iterateState = 0
    poolState = {
      leftToDo = 0
      thePool = 0x0000000000000000
      currentPuddle = 0x0000000000000000
      lastSlot = 0
      flags = 0
    }
  }
  _firstIteration = false
}
(lldb) x/10s 0x00007fc5a5072388
0x7fc5a5072388: "("
0x7fc5a507238a: "(Ljava/util/concurrent/CountDownLatch;)V\b"
0x7fc5a50723b4: "finalize\x03"
0x7fc5a50723be: "()V"
0x7fc5a50723c2: "\r"
0x7fc5a50723c4: "WhiteBox.java"
0x7fc5a50723d2: "\n"
0x7fc5a50723d4: "WhiteBox$1\x10"
0x7fc5a50723e0: "java/lang/Object\b"
0x7fc5a50723f2: "WhiteBox\v"
(lldb) up
libj9gc29.dylib was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #17: 0x000000000594b873 libj9gc29.dylib`MM_RootScanner::scanStringTable(this=0x000070000cc58c68, env=0x00007fc5a4048e68) at RootScanner.cpp:621 [opt]
   618 					GC_StringTableIterator stringTableIterator(stringTable->getTable(tableIndex));
   619 					J9Object **slot = NULL;
   620 					while (NULL != (slot = (J9Object **)stringTableIterator.nextSlot())) {
-> 621 						doStringTableSlot(slot, &stringTableIterator);
   622 					}
   623 				}
   624 			}
(lldb) p **(J9Object **)0x00007fc5a5072388
error: Couldn't apply expression side effects : Couldn't dematerialize a result variable: couldn't read its memory
(lldb)

@dmitripivkine
Copy link
Contributor

slotPtr is pointer to slot where pointer to J9Object is located (J9Object** slotPtr). An address of J9Object should be in the object heap range. Obviously it should not point to UTF8 data.
In general there are two possible locations of the objects - object heap (heap allocated) or java stack of thread (stack allocated). String Table should not contain stack allocated objects.

If you check caller of MM_MarkingSchemeRootClearer::doStringTableSlot() in MM_RootScanner::scanStringTable:

				GC_StringTableIterator stringTableIterator(stringTable->getTable(tableIndex));
				J9Object **slot = NULL;
				while (NULL != (slot = (J9Object **)stringTableIterator.nextSlot())) {
					doStringTableSlot(slot, &stringTableIterator);
				}

GC_StringTableIterator is alias name of GC_HashTableIterator, it is standard Hash Table Iteration:

void **
GC_HashTableIterator::nextSlot()
{
	void **value;
	
	if (_firstIteration) {
		_firstIteration = false;

		value = (void **)hashTableStartDo(_hashTable, &_handle);
	} else {
		value = (void **)hashTableNextDo(&_handle);
	}

	return value;
}

I guess there is a corruption in the hash table.
Another possibility that String Table contain stall pointer to collected object. In this case an address of the object must be in object heap range (can be checked in DDR !dumpallregions or in javacore)

Please let me know if you need help, I can take a look to system core.

@dmitripivkine
Copy link
Contributor

@theresa-m kindly generated a few system cores using latest JVM, so DDR for Mac now works.
I looked to two cores (crashes in GC and JIT). Long story short every time crash occur because one of j9pool's paddles contain garbage. So obviously there are two possibilities: bad paddle address was added to paddle list or memory itself where paddle header use to be located has been corrupted. Looks like it is a case of memory corruption. The memory located right before expected paddle has footer !J9MemTag corrupted. It is in both cases memory allocated from segment.c:238 size 0x4784 (18308). So I believe there is a memory segment was overwritten by some reason. I will continue investigation and try to figure out who own such memory segment.

@DanHeidinga
Copy link
Member

@theresa-m had mentioned that backing out this PR prevents the failures: #5301 but never of us have found the smoking gun on why this triggers the problem

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Aug 27, 2019

The problematic segment written out-of-border is a memory allocated for ROM class for WhiteBox class

@dmitripivkine
Copy link
Contributor

There is an example of debugging for one of cores (ask me for details if it is not there):
I am going to investigate post-crash system core file using jdmpview - powerful tool for OpenJ9 system core dump analysis.
The core I selected for example is core.20190826.111238.38364.0001.dmp
run it on Linux x86 machine:

../jdk-12.0.2+10/bin/jdmpview -core core.20190826.111238.38364.0001.dmp
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

For a list of commands, type "help"; for how to use "help", type "help help"
Available contexts (* = currently selected context) :

Source : file:///team/Dmitri/temp/temp0/core.20190826.111238.38364.0001.dmp
	*0 : PID: 0 : JRE 11 Mac OS X amd64-64 (build 11.0.4+11-201908171833)

>

As far as it is crash we should see crash record:

> !gpinfo
Failing Thread: !j9vmthread 0x11517100
Failing Thread ID: 0x37ab85 (3648389)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000000
Handler1=000000000E2229B0 Handler2=000000000E533D90
RDI=000070000E4CCC48 RSI=29280003657A696C RAX=00007F93478053D0 RBX=000070000E4CCB18
RCX=0000000010ED1098 RDX=000070000E4CCB18 R8=0000F8FA2D4B6109 R9=00007F9347803A80
R10=00007F93460054A8 R11=0000000000000000 R12=00007F9348012C20 R13=0000000000000006
R14=0000000010ED0FE8 R15=000070000E4CCC48
RIP=0000000010E244F3 GS=0000 FS=0000 RSP=000070000E4CCAF0
RFlags=0000000000010206 CS=002B RBP=000070000E4CCAF0 ERR=1E41D00000000000
TRAPNO=000000000000000D CPU=D000000000000000 FAULTVADDR=000000001E41D000
XMM0 000000000000000d (f: 13.000000, d: 6.422853e-323)
XMM1 0000000000000012 (f: 18.000000, d: 8.893182e-323)
XMM2 000000000000000e (f: 14.000000, d: 6.916919e-323)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM5 bfb4d30c00000000 (f: 0.000000, d: -8.134532e-02)
XMM6 000fffffffffffff (f: 4294967296.000000, d: 2.225074e-308)
XMM7 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/Users/theresa.t.mammarella@ibm.com/local-test/adopt-local-tests/jdk-11.0.4+11/Contents/Home/lib/compressedrefs/libj9gc29.dylib
Module_base_address=0000000010CD3000 Symbol=_ZN27MM_MarkingSchemeRootClearer17doStringTableSlotEPP8J9ObjectP20GC_HashTableIterator
Symbol_address=0000000010E244E0

From here we can see not only registers but also crashing point in the code MM_MarkingSchemeRootClearer17doStringTableSlotEPP8J9ObjectP20GC_HashTableIterator
So it crashed in GC - Global Stop-The-World GC. The stack from native debugger confirms this:

frame #13: 0x00007fff691d0b5d libsystem_platform.dylib`_sigtramp + 29
    frame #14: 0x0000000010e244f4 libj9gc29.dylib`MM_MarkingSchemeRootClearer::doStringTableSlot(J9Object**, GC_HashTableIterator*) + 20
    frame #15: 0x0000000010ce7a12 libj9gc29.dylib`MM_RootScanner::scanStringTable(MM_EnvironmentBase*) + 610
    frame #16: 0x0000000010ce8dfb libj9gc29.dylib`MM_RootScanner::scanClearable(MM_EnvironmentBase*) + 267
    frame #17: 0x0000000010e2074b libj9gc29.dylib`MM_MarkingDelegate::workerCompleteGC(MM_EnvironmentBase*) + 315
    frame #18: 0x0000000010dfb6bb libj9gc29.dylib`MM_ParallelMarkTask::run(MM_EnvironmentBase*) + 91
    frame #19: 0x0000000010dfa3ac libj9gc29.dylib`MM_ParallelDispatcher::slaveEntryPoint(MM_EnvironmentBase*) + 124
    frame #20: 0x0000000010dfa299 libj9gc29.dylib`dispatcher_thread_proc2(OMRPortLibrary*, void*) + 249

(The one of reasons I selected this core for debugging is all java threads stacks are walkable at safe point. Another reason I am a GC Team member so I more familiar with GC structures).

GC executes Clearable phase (remove references to dead objects from JVM structures. The crash occur an attempt to iterate through String Table, so looks like something is wrong here.
There are a few ways to verify String Table in jdmpview. For example:

> !dumpstringtable
Problem running command:
U64 contains value larger than Integer.MAX_VALUE
> !gccheck stringtable
Starting GC Check
Checking STRING TABLE...Problem running command:
U64 contains value larger than Integer.MAX_VALUE

As you see both of attempts failed and we can reasonably guess the reason is the same as for crash - something in String Table infrastructure is broken.
There are two ways to investigate from this point: dig through String Table internals or use shortcut just run java code used in jdmpview in java debugger (we run it in Eclipse) and see where command failed. However I would show you String Table internals. The link to String Table can be found in GCExtensions:

> !findvm
!j9javavm 0x00007F934600E620
> !j9javavm 0x00007F934600E620 | grep -i gcext
	0x22e8: void* gcExtensions = !j9x 0x00007F9348001220
> !mm_gcextensions 0x00007F9348001220 | grep -i stringtable
	0x5c50: class MM_StringTable* stringTable = !mm_stringtable 0x00007F9348012C20
	0x5f58: U32 _stringTableListToTreeThreshold = 0x00000400 (1024)
> !mm_stringtable 0x00007F9348012C20
MM_StringTable at 0x7f9348012c20 {
  Fields for MM_Base:
  Fields for MM_BaseVirtual:
	0x0: void** _vptr$MM_BaseVirtual = !j9x 0x0000000010EC9D98
	0x8: const U8* _typeId = !j9x 0x0000000010E8ECD7
  Fields for MM_StringTable:
	0x10: UDATA _tableCount = 0x0000000000000008 (8)
	0x18: void** _table = !j9x 0x00007F9347806490
	0x20: struct J9ThreadMonitor** _mutex = !j9x 0x00007F9347806510
	0x28: struct J9Object*[] _cache = !j9x 0x00007F9348012C48
}

We are on the right way, you can find an address of String Table in R12 in crash info.
String Table is combined from number of hash tables. _table is an address of hash tables array and size of it is _tableCount so it is 8:

> !j9x 0x00007F9347806490,40
0x7F9347806490 :  00007f9347806590 00007f93478069b0 [ .e.G.....i.G.... ]
0x7F93478064A0 :  00007f9347806dd0 00007f93478071f0 [ .m.G.....q.G.... ]
0x7F93478064B0 :  00007f9347807610 00007f9347807a30 [ .v.G....0z.G.... ]
0x7F93478064C0 :  00007f9347807e50 00007f9347808270 [ P~.G....p..G.... ]

We need to inspect correctness of each of this hash table but from our shortcut we know that problematic hash table is 0x00007f9347807e50:

>  !J9HashTable 0x00007f9347807e50
J9HashTable at 0x7f9347807e50 {
  Fields for J9HashTable:
	0x0: const U8* tableName = !j9x 0x0000000010E8ECB0
	0x8: U32 tableSize = 0x0000032B (811)
	0xc: U32 numberOfNodes = 0x0000025C (604)
	0x10: U32 numberOfTreeNodes = 0x00000000 (0)
	0x14: U32 entrySize = 0x00000008 (8)
	0x18: U32 listNodeSize = 0x00000010 (16)
	0x1c: U32 treeNodeSize = 0x00000018 (24)
	0x20: U32 nodeAlignment = 0x00000008 (8)
	0x24: U32 flags = 0x00000002 (2)
	0x28: U32 memoryCategory = 0x80000003 (2147483651)
	0x2c: U32 listToTreeThreshold = 0x00000400 (1024)
	0x30: void** nodes = !j9x 0x00007F9346069820
	0x38: struct J9Pool* listNodePool = !j9pool 0x00007F9347807F20
	0x40: struct J9Pool* treeNodePool = !j9pool 0x00007F9347808180
	0x48: struct J9Pool* treePool = !j9pool 0x00007F9347808010
	0x50: struct J9AVLTree* avlTreeTemplate = !j9avltree 0x00007F9347808100
	0x58: void* hashFn = !j9x 0x0000000010CE9C60
	0x60: void* hashEqualFn = !j9x 0x0000000010DC68D0
	0x68: void* printFn = !j9x 0x0000000000000000
	0x70: struct OMRPortLibrary* portLibrary = !omrportlibrary 0x000000000E154BE8
	0x78: void* equalFnUserData = !j9x 0x00007F9347808100
	0x80: void* hashFnUserData = !j9x 0x00007F934600E620
	0x88: struct J9HashTable* previous = !j9hashtable 0x0000000000000000
}

Hash table can be converted to AVL tree bot not in this case (numberOfTreeNodes = 0x00000000)
So we should look to listNodePool:

> !j9pool 0x00007F9347807F20
J9Pool at 0x7f9347807f20 {
  Fields for J9Pool:
	0x0: U64 elementSize = 0x0000000000000010 (16)
	0x8: U64 elementsPerPuddle = 0x0000000000000090 (144)
	0x10: U64 puddleAllocSize = 0x0000000000000A50 (2640)
	0x18: J9WSRP(J9PoolPuddleList) puddleList = !j9poolpuddlelist 0x00007F9347807FB0
	0x20: void* memAlloc = !j9x 0x0000000010CD47A0
	0x28: void* memFree = !j9x 0x0000000010CD47B0
	0x30: const U8* poolCreatorCallsite = !j9x 0x0000000010E8ECB0
	0x38: void* userData = !j9x 0x000000000E154BE8
	0x40: U16 alignment = 0x0008 (8)
	0x42: U16 flags = 0x002A (42)
	0x44: U32 memoryCategory = 0x80000003 (2147483651)
}

The easiest way to inspect J9Pool is just walk it:

> !walkj9pool 0x00007F9347807F20
J9Pool at 0x7f9347807f20
{
Problem running command:
U64 contains value larger than Integer.MAX_VALUE

Oops... looking to pool puddle list:

> !j9poolpuddlelist 0x00007F9347807FB0
J9PoolPuddleList at 0x7f9347807fb0 {
  Fields for J9PoolPuddleList:
	0x0: U64 numElements = 0x000000000000025C (604)
	0x8: J9WSRP(J9PoolPuddle) nextPuddle = !j9poolpuddle 0x00007F9345846420
	0x10: J9WSRP(J9PoolPuddle) nextAvailablePuddle = !j9poolpuddle 0x00007F9345846420
}

Trying to get first puddle:

> !j9poolpuddle 0x00007F9345846420
J9PoolPuddle at 0x7f9345846420 {
  Fields for J9PoolPuddle:
	0x0: U64 usedElements = 0x0000000100000110 (4294967568)
	0x8: J9SRP firstElementAddress = !j9x 0x0000000000000000
	0xc: J9SRP(UDATA) firstFreeSlot = <FAULT>
	0x10: J9WSRP(J9PoolPuddle) prevPuddle = !j9poolpuddle 0x0000000000000000
	0x18: J9WSRP(J9PoolPuddle) nextPuddle = !j9poolpuddle 0x000081B445846438
	0x20: J9WSRP(J9PoolPuddle) prevAvailablePuddle = !j9poolpuddle 0x00007F934584645F
	0x28: J9WSRP(J9PoolPuddle) nextAvailablePuddle = !j9poolpuddle 0x0000000000000000
	0x30: U64 userData = 0x0000000000000000 (0)
	0x38: U64 flags = 0x0000000000000000 (0)
}

This does not look correct. Looking to raw memory:

> !j9x 0x00007F9345846420,200
0x7F9345846420 :  0000000100000110 0000000000000000 [ ................ ]
0x7F9345846430 :  0000000000000000 0000022100000000 [ ............!... ]
0x7F9345846440 :  000000000000001f 0000000000000000 [ ................ ]
0x7F9345846450 :  0000000000000000 0000000000000000 [ ................ ]
0x7F9345846460 :  0000004800000004 0000004800000005 [ ....H.......H... ]
0x7F9345846470 :  0000004800000007 000000020000014a [ ....H...J....... ]
0x7F9345846480 :  000000020000014e 0000000200000158 [ N.......X....... ]
0x7F9345846490 :  0000000200000168 000000ac000000a0 [ h............... ]
0x7F93458464A0 :  0000014600021010 0000008c00000104 [ ....F........... ]
0x7F93458464B0 :  000000bc00000098 00000162000000f4 [ ............b... ]
0x7F93458464C0 :  000000ac000000ec 01040000000000b0 [ ................ ]
0x7F93458464D0 :  00000200000a0002 02b72a0001b52b2a [ ........*+...*.. ]
0x7F93458464E0 :  000000110000e400 0000000000020002 [ ................ ]
0x7F93458464F0 :  000000ae0000d280 00050004000000b4 [ ................ ]
0x7F9345846500 :  0000010000080001 ac0003b60001b4d7 [ ................ ]
0x7F9345846510 :  0002000200000011 0000d28000000000 [ ................ ]
0x7F9345846520 :  0101010109090700 ffffff7800000006 [ ............x... ]
0x7F9345846530 :  fffffff400000082 6966246c6176000d [ ..........val$fi ]
0x7F9345846540 :  0064657a696c616e 2f6176616a4c0025 [ nalized.%.Ljava/ ]
0x7F9345846550 :  6e6f632f6c697475 2f746e6572727563 [ util/concurrent/ ]
0x7F9345846560 :  776f44746e756f43 003b686374614c6e [ CountDownLatch;. ]
0x7F9345846570 :  3e74696e693c0006 6176616a4c280028 [ ..<init>(.(Ljava ]
0x7F9345846580 :  6f632f6c6974752f 746e65727275636e [ /util/concurrent ]
0x7F9345846590 :  6f44746e756f432f 3b686374614c6e77 [ /CountDownLatch; ]
0x7F93458465A0 :  616e696600085629 29280003657a696c [ )V..finalize..() ] <-- it is in RSI register
0x7F93458465B0 :  74696857000d0056 76616a2e786f4265 [ V...WhiteBox.jav ]
0x7F93458465C0 :  74696857000a0061 00103124786f4265 [ a...WhiteBox$1.. ]
0x7F93458465D0 :  6e616c2f6176616a 7463656a624f2f67 [ java/lang/Object ]
0x7F93458465E0 :  4265746968570008 63726f66000b786f [ ..WhiteBox..forc ]
0x7F93458465F0 :  0063476c6c754665 752f6176616a0023 [ eFullGc.#.java/u ]
0x7F9345846600 :  636e6f632f6c6974 432f746e65727275 [ til/concurrent/C ]
0x7F9345846610 :  6e776f44746e756f 000900686374614c [ ountDownLatch... ]

This is not a valid pool puddle. And obviously original crash occur here: we can see value we crashed at RSI=29280003657A696C in 0x7F93458465A8. A garbage in puddle memory was used for calculation and this bogus value has been loaded.

There are two possibilities at this point:

  1. Puddle address in puddle list has been written wrong or corrupted
  2. Memory itself where puddle use to be is corrupted

There is a strong evidence that we have second case. OpenJ9 uses header and footer tags for allocated memory. Header has eye-catcher 0xb1234567 and footer has eye-catcher 0xb7654321
We can try to find them:

> !findheader 0x7F9345846420
Searching memory allocation header for 0x00007F9345846420
J9MemTag check failed at 7f93458463a8: Corrupt memory section found :Wrong eyecatcher. Expected 0xb7654321 but was 0x00007F93458463A8

This tells us that footer eye-catcher is expected to be at 0x7f93458463a8 but it is not there:

> !j9x 0x7f9345846380,100
0x7F9345846380 :  2f676e616c2f6176 432f656b6f766e69 [ va/lang/invoke/C ]
0x7F9345846390 :  3b657469536c6c61 0000000000000000 [ allSite;........ ]
0x7F93458463A0 :  dddddddd00000000 0000000000000288 [ ................ ] <--- eye-catcher expected to be here
0x7F93458463B0 :  0000021a00000212 4a80400000000020 [ ........ ....@.J ]
0x7F93458463C0 :  000000e000000000 000000f800000002 [ ................ ]
0x7F93458463D0 :  000000c400000001 0000000000000000 [ ................ ]
0x7F93458463E0 :  0000000800000008 ffffffffffffffc0 [ ................ ]
0x7F93458463F0 :  0000000e00000288 0000000000000128 [ ........(....... ]
0x7F9345846400 :  0000000000000000 000001d40000009c [ ................ ]
0x7F9345846410 :  0000009000000000 0000004100000037 [ ........7...A... ]
0x7F9345846420 :  0000000100000110 0000000000000000 [ ................ ]<-- puddle use to be here
0x7F9345846430 :  0000000000000000 0000022100000000 [ ............!... ]
0x7F9345846440 :  000000000000001f 0000000000000000 [ ................ ]
0x7F9345846450 :  0000000000000000 0000000000000000 [ ................ ]
0x7F9345846460 :  0000004800000004 0000004800000005 [ ....H.......H... ]
0x7F9345846470 :  0000004800000007 000000020000014a [ ....H...J....... ]

The data written over looks "contiguous" so there is reasonable guess that chunk of memory allocated lower has been overwritten out-of-bound. Lets try to figure out what it is. We need to find header tag for this allocation. There it is:

0x7F9345841C00 :  f485ae20b1234567 0000000000004784 [ gE#. ....G...... ]
0x7F9345841C10 :  000000000e37e7f0 000000000e1528a0 [ ..7......(...... ]
0x7F9345841C20 :  0000000000004788 00003a5200003b8c [ .G.......;..R:.. ]

> !J9MemTag 0x7F9345841C00 <--- must have footer at 0x00007F93458463A8
J9MemTag at 0x7f9345841c00 {
  Fields for J9MemTag:
	0x0: U32 eyeCatcher = 0xB1234567 (2971878759)
	0x4: U32 sumCheck = 0xF485AE20 (4102401568)
	0x8: U64 allocSize = 0x0000000000004784 (18308)
	0x10: const U8* callSite = !j9x 0x000000000E37E7F0 <-- allocated at segment.c:238
	0x18: struct OMRMemCategory* category = !omrmemcategory 0x000000000E1528A0 <-- (Classes)
}

Is there somebody around to keep reference to this memory?

> !findall u64 0x7F9345841C20
Scanning memory for 20 1c 84 45 93 7f 00 00 aligned to 8 starting from 0x0
Match found at 0x11623b08
Match found at 0x7f934882c8e0
Match found at 0x7f934882c8e8
No more matches
> !j9x 0x11623b00,20
0x11623B00 :  0000000099669966 00007f9345841c20 [ f.f..... ..E.... ]
0x11623B10 :  0000000011623de0 00000000020e0001 [ .=b............. ]

The 0x99669966 is an eye-catcher for j9class:

> !j9class 0x11623B00
J9Class at 0x11623b00 {
  Fields for J9Class:
	0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
	0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F9345841C20 <----- ROM class overwrite reserved space
	0x10: void** superclasses = !j9x 0x0000000011623DE0
	0x18: UDATA classDepthAndFlags = 0x00000000020E0001 (34471937)
	0x20: U32 classDepthWithFlags = 0x00000000 (0)
	0x24: U32 classFlags = 0x00000000 (0)
	0x28: struct J9ClassLoader* classLoader = !j9classloader 0x00007F93480518F0
	0x30: struct J9Object* classObject = !j9object 0x00000000E015ECE0 // java/lang/Class
	0x38: volatile UDATA initializeStatus = 0x0000000000000001 (1)
	0x40: struct J9Method* ramMethods = !j9method 0x0000000011623448 // WhiteBox.<init>()V
	0x48: UDATA* ramStatics = !j9x 0x0000000000000000
	0x50: struct J9Class* arrayClass = !j9class 0x0000000000000000
	0x58: UDATA totalInstanceSize = 0x000000000000002C (44)
	0x60: struct J9ITable* lastITable = !j9itable 0x000000000E36CD88
	0x68: UDATA* instanceDescription = !j9x 0x0000000000000FFD
	0x70: UDATA* instanceLeafDescription = !j9x 0x0000000000000001
	0x78: UDATA instanceHotFieldDescription = 0x0000000000000000 (0)
	0x80: UDATA selfReferencingField1 = 0x0000000000000000 (0)
	0x88: UDATA selfReferencingField2 = 0x0000000000000000 (0)
	0x90: struct J9Method* initializerCache = !j9method 0x0000000000000000
	0x98: UDATA romableAotITable = 0x00000000103DA000 (272474112)
	0xa0: UDATA packageID = 0x00007F93480518F0 (140270545213680)
	0xa8: struct J9Module* module = !j9module 0x0000000000000000
	0xb0: struct J9Class* subclassTraversalLink = !j9class 0x0000000011611C00 // java/net/URLClassLoader$3
	0xb8: struct J9Class* subclassTraversalReverseLink = !j9class 0x00000000115D1D00 // org/testng/ITestListener
	0xc0: void** iTable = !j9x 0x0000000000000000
	0xc8: UDATA castClassCache = 0x0000000011636401 (291726337)
	0xd0: void** jniIDs = !j9x 0x00007F934789C130
	0xd8: UDATA lockOffset = 0x0000000000000004 (4)
	0xe0: UDATA newInstanceCount = 0x00000000000003E8 (1000)
	0xe8: IDATA backfillOffset = 0x0000000000000030 (48)
	0xf0: struct J9Class* replacedClass = !j9class 0x0000000000000000
	0xf8: UDATA finalizeLinkOffset = 0x0000000000000000 (0)
	0x100: struct J9Class* nextClassInSegment = !j9class 0x0000000000000000
	0x108: UDATA* ramConstantPool = !j9x 0x0000000011622870
	0x110: struct J9Object** callSites = !j9x 0x0000000011623DA0
	0x118: struct J9Object** methodTypes = !j9x 0x0000000000000000
	0x120: struct J9Object** varHandleMethodTypes = !j9x 0x0000000011623E10
	0x128: struct J9VMCustomSpinOptions* customSpinOption = !j9vmcustomspinoptions 0x0000000000000000
	0x130: void** staticSplitMethodTable = !j9x 0x0000000000000000
	0x138: void** specialSplitMethodTable = !j9x 0x0000000000000000
	0x140: struct J9JITExceptionTable* jitMetaDataList = !j9jitexceptiontable 0x0000000000000000
	0x148: struct J9Class* gcLink = !j9class 0x0000000000000000
	0x150: struct J9Class* hostClass = !j9class 0x0000000011623B00 // WhiteBox
	0x158: struct J9Class* nestHost = !j9class 0x0000000000000000
	0x160: struct J9FlattenedClassCache* flattenedClassCache = !j9flattenedclasscache 0x0000000000000000
}
Class name: WhiteBox
To view static fields, use !j9statics 0x0000000011623B00
To view instance shape, use !j9classshape 0x0000000011623B00

and problematic pointer is a pointer to ROM class.

So the question is what is special in WhiteBox class that we failed to calculate expected ROM class size properly?

@DanHeidinga
Copy link
Member

TODO: determine if this issue has previously shipped or not

@pshipton
Copy link
Member

Given this was discovered Jun 3, the problem is in the 0.15 release which occurred in July. Moving to the next milestone.

@pshipton
Copy link
Member

This should now be fixed, the test can be unexcluded.

@theresa-m
Copy link
Contributor

I've opened this pr to reinclude the test adoptium/aqa-tests#1350

@smlambert
Copy link
Contributor

Added a note in PR 1350, is it also to be reincluded for jdk13? as its excluded in that version also, will merge PR once that is updated or clarified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AQA Test failures from AdoptOpenJDK Quality Assurance test set comp:vm test failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants