Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RocksJava EXCEPTION_ILLEGAL_INSTRUCTION (0xc000001d) on Windows Server 2016 when reopening existing database with more than 2 keys #11096

Closed
Hutmar opened this issue Jan 17, 2023 · 14 comments

Comments

@Hutmar
Copy link

Hutmar commented Jan 17, 2023

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

When i reopen a database with an arbitrary number of key is expect to be able to call Rocksdb::get or RockIterator::seekToFirst without crashing the vm

Actual behavior

VM crashes with EXCEPTION_ILLEGAL_INSTRUCTION (0xc000001d) when calling Rocksdb::get or RockIterator::seekToFirst 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ILLEGAL_INSTRUCTION (0xc000001d) at pc=0x00007fffee619a49, pid=2780, tid=9568
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, windows-amd64)
# Problematic frame:
# C  [librocksdbjni3639209328417945052.dll+0x529a49]
#
# Core dump will be written. Default location: C:\work\zpointcs\components\businesscomponents\businesscomponents\hs_err_pid2780.mdmp
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Dorg.gradle.internal.worker.tmpdir=C:\work\zpointcs\buildTemp\businesscomponents\tmp\test\work -Dorg.gradle.native=false --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xmx512m -Dfile.encoding=windows-1252 -Duser.country=AT -Duser.language=de -Duser.variant -ea worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Test Executor 110'

Host: Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz, 8 cores, 15G,  Windows Server 2016 , 64 bit Build 14393 (10.0.14393.0)
Time: Tue Jan 17 16:35:07 2023 Mitteleuropäische Zeit elapsed time: 3.350847 seconds (0d 0h 0m 3s)

---------------  T H R E A D  ---------------

Current thread (0x0000027e23939c10):  JavaThread "Test worker" [_thread_in_native, id=9568, stack(0x000000ea1c800000,0x000000ea1c900000)]

Stack: [0x000000ea1c800000,0x000000ea1c900000],  sp=0x000000ea1c8f7bb0,  free space=990k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [librocksdbjni3639209328417945052.dll+0x529a49]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.rocksdb.RocksDB.get(J[BII)[B+0
j  org.rocksdb.RocksDB.get([B)[B+9

Steps to reproduce the behavior

  • create a database and put more than two values
  • close via RocksDB::close
  • stop java process
  • start new java process
  • open database again
  • call either RocksDB::get or RocksIterator::seekToFirst

Interesting fact: opening the database from an windows 10 system, it works. Does not work on Windows Server 2016 or 2019

Version: 7.8.3

hs_err_pid9732.log

@Hutmar
Copy link
Author

Hutmar commented Jan 18, 2023

Tested on Windows Server 2016 and Windows Server 2019:

  • No problem with version 6.29.5
  • Problem occurs with all versions >= 7.0.4
  • Problem occurs with all tested Java versions (Adopt openjdk Java 8 and Termurin Java 17, 64-Bit)

Test java program attached (Crash always occurs when running the test program more than once).

RocksDbTest.zip

@adamretter
Copy link
Collaborator

adamretter commented Jan 18, 2023

What version of the msvcpp runtime do you have installed on each system you tested with?

@stefan-zobel
Copy link
Contributor

I can't reproduce this. I've run the test program on Windows Server 2016 (Version 1607) and Windows Server 2019 (Version 1809) using rocksdbjni-7.8.3-win64.jar and an Azul Zulu jdk17.0.5+8-Zulu17.38.21-CA JVM. Works on both systems.

@Hutmar
Copy link
Author

Hutmar commented Jan 19, 2023

@stefan-zobel did you run the test program at least twice? because the error occurs from the second run on.
I will do another test with Azul Zulu jdk, maybe it has something to do with the jdk

@adamretter msvcp versions on the Windows Server 2016 system (only x64 listed):
2008 9.0.30729.17
2010 10.0.30319
2013 12.0.30501
2015-2022 14.34.31931

@stefan-zobel
Copy link
Contributor

did you run the test program at least twice?

I did run each test twice

@adamretter
Copy link
Collaborator

msvcp versions on the Windows Server 2016 system (only x64 listed):

@Hutmar the version I have on my Windows Server 2012 system is:

  1. Microsoft Visual C++ 2015-2019 Redistributable (x64) - 14.29.30139

How does that compare to yours?

@Hutmar
Copy link
Author

Hutmar commented Jan 19, 2023

I am not sure, i guess 2015-2022 14.34.31931, i do not know how to find out which version is used by rocksdbjni.

I also tested with Azul Zulu JDK on Windows Server 2016, vm also crashes, hs_err_pid attachted

hs_err_pid8728.log

@stefan-zobel
Copy link
Contributor

I am not sure, i guess ...

Your hs_err_pid log shows (line 487, 498/499) that msvcp and vcruntime DLLs are loaded from zulu17.40.19-ca-jdk17.0.6-win_x64\bin (which is what I would expect)

@stefan-zobel
Copy link
Contributor

stefan-zobel commented Jan 19, 2023

I can reproduce this on a very old laptop that still has a Sandy Bridge CPU (like the OP's Windows 2016 machine seems to have).
In the disassembly of the minidump I can see a bzhi instruction which appears to be the illegal instruction that causes the crash (which would make sense as bzhi was introduced in the Haswell architecture). Hope this helps. (Forgot to mention: I used 7.9.2 for this test)

@stefan-zobel
Copy link
Contributor

stefan-zobel commented Jan 21, 2023

FWIW, at least in my test environment that bzhi instruction seems to come from snappy (snappy.cc, line 965):

static inline uint32_t ExtractLowBytes(uint32_t v, int n) {
  assert(n >= 0);
  assert(n <= 4);
#if SNAPPY_HAVE_BMI2
  return _bzhi_u32(v, 8 * n);
#else
  // This needs to be wider than uint32_t otherwise `mask << 32` will be
  // undefined.
  uint64_t mask = 0xffffffff;
  return v & ~(mask << (8 * n));
#endif
}

The hypothesis would then be that #define SNAPPY_HAVE_BMI2 1 in the RocksJava Windows build, which would be fatal on a Sandy Bridge CPU.

@stefan-zobel
Copy link
Contributor

A quick test did at least not contradict the hypothesis. I did a rebuild of snappy-1.1.9 with -DSNAPPY_HAVE_BMI2=0 and of RocksJava v7.9.2 with -DPORTABLE=1. In that configuration the test case runs successfully on this really antiquated CPU.

@Hutmar In case you want to try it out, I've attached a zipped jar (password is 123).

rocksdbjni-7.9.2-win64-snappy-patch.jar.zip

@Hutmar
Copy link
Author

Hutmar commented Jan 26, 2023

Thank you very much, i can confirm it works on our Windows Server 2016 and Windows Server 2019 machines. I also did another test and changed compression type in the options to CompressionType.LZ4_COMPRESSION, which also works for unpatched rocksdbjni version 7.8.3.

What changed in version 7.x? Because in rocksdbjni versions < 7 it also works without changing compression type?

My last question is, will future versions of rocksdbjni work again on systems with Sandy Bridge CPUs or do we have to use versions < 7 / change compression type?

@stefan-zobel
Copy link
Contributor

What changed in version 7.x? Because in rocksdbjni versions < 7 it also works without changing compression type?

My last question is, will future versions of rocksdbjni work again on systems with Sandy Bridge CPUs or do we have to use versions < 7 / change compression type?

@Hutmar I don't know the answers to your questions. I'm not associated with the RocksDB team, just a random RocksJava user. Let's see what @adamretter has to say.

My personal take on pre-Haswell CPU support is quite radical:
The vast majority of x86_64 CPUs that have been sold in the past 9+ years have AVX2 support, even most low-end consumer models do (apart from some Celeron models that often don't).
Disabling AVX2 support in the RocksJava build taxes the overwhelming majority of CPUs out there in terms of foregone speed and increased energy consumption for no tangible reason.
So, I think the current build settings are backwards: the default should be to support the predominant CPU feature set, not the least common denominator governed by a vanishing small number of outsider CPUs.

Just my 2c

@adamretter
Copy link
Collaborator

@Hutmar I am afraid that I agree with @stefan-zobel. As RocksDB explicitly strives for high performance I think it is unlikely that we would work backwards. I was happy to see that @stefan-zobel was able to suggest a workable solution for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants