-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Better support for pre-SSE4.2 CPUs #15589
Comments
@ansarizafar sorry to hear you're still hitting this error! As mentioned in #14443, the root cause is that our release binaries include RocksDB compiled with SSE4.2, which your machine seems not to support. I'm going to investigate making our default behaviour better on older hardware. In the meantime, building from source should produce a binary that will run on your machine. Could you try that? |
I think there is something relatively straightforward that can be done here: very early in the lifecycle of the process we can detect if the CPU supports SSE4.2 and exit with a detailed error message if it does not. The code to detect whether the CPU supports SSE4.2 already exists in RocksDB |
FWIW, if that RocksDB code worked as intended, we wouldn't have this problem, since we'd gracefully fall back to the slow, non-SSE4.2 CRC32 implementation. @tamird and I discussed at some point, and the conclusion was that compiling all of RocksDB with Would we prefer to run on these old CPUs, albeit with a slow checksum? If so, we could upstream the above fix, rather than failing fast. |
Yes, running on old CPUs, albeit with a slow checksum would be preferable. I wonder if there is a performance hit to not compiling the rest of RocksDB with |
See also facebook/rocksdb#2488 for an upstream issue. GCC has a The compiler is using SSE4.2 instructions elsewhere, so it must expect some benefit. Besides the CRC32 function, the rest of SSE4.2 appears to be optimized string-comparison instructions. It's unclear how much difference this makes (I also assume that Go is not doing this for the Go parts of our codebase). We could also try |
Clang doesn't support |
What I think we want is to apply the |
Previously, RocksDB was compiled with `-msse4.2` across the board. The compiler, unsurprisingly took the liberty of actually emitting SSE4.2 instructions, which could cause `SIGILL` crashes on pre-SSE4.2 CPUs (i.e, CPUs released before ca. November 2008). The only part of RocksDB that really benefits from SSE4.2 instructions, however, is the CRC32C checksum code, which can use the hardware CRC32C instructions provided by SSE4.2. This commit simply removes `-msse4.2` from the RocksDB compile flags, and renables it for the FastCRC32 function only via a GCC attribute. Since FastCRC32 is already properly guarded by a CPUID check, which falls back to a slower implementation at runtime if the CPU does not support SSE4.2, this should get us nearly all of the performance benefits of compiling with `-msse4.2` while still supporting pre-SSE4.2 CPUs. Fixes cockroachdb#15589.
Previously, RocksDB was compiled with `-msse4.2` across the board. The compiler, unsurprisingly took the liberty of actually emitting SSE4.2 instructions, which could cause `SIGILL` crashes on pre-SSE4.2 CPUs (i.e, CPUs released before ca. November 2008). The only part of RocksDB that really benefits from SSE4.2 instructions, however, is the CRC32C checksum code, which can use the hardware CRC32C instructions provided by SSE4.2. This commit simply removes `-msse4.2` from the RocksDB compile flags, and renables it for the FastCRC32 function only via a GCC attribute. Since FastCRC32 is already properly guarded by a CPUID check, which falls back to a slower implementation at runtime if the CPU does not support SSE4.2, this should get us nearly all of the performance benefits of compiling with `-msse4.2` while still supporting pre-SSE4.2 CPUs. To ensure we don't lose the benefits of vectorization provided by SSE, SSE2, and SSE3, we compile all of RocksDB with `-msse3`, which is supported by most x86 CPUs released since 2004. Fixes cockroachdb#15589.
Previously, the bincheck test started failing after upgrading the linux build system to a newer GLIBC and after we changed the way we stop the server. The linux bincheck test was relying on running the cockroach binary in a qemu-based VM, using a custom kernel. Now that, we don't use rocksdb and don't have SSE issues (see cockroachdb#15589), we can simplify the test and run the binary without the VM layer. This PR also fixes the issue, where the busybox `kill` command can only handle positional arguments. We try our best to kill the server, but in some cases the PID changes and the whole test fail. Ignore the kill exit code and let the build agent to kill the process. Epic: none Fixes: RE-271 Release note: None
94975: release: fix bincheck failures r=rail a=rail Previously, the bincheck test started failing after upgrading the linux build system to a newer GLIBC and after we changed the way we stop the server. The linux bincheck test was relying on running the cockroach binary in a qemu-based VM, using a custom kernel. Now that, we don't use rocksdb and don't have SSE issues (see #15589), we can simplify the test and run the binary without the VM layer. This PR also fixes the issue, where the busybox `kill` command can only handle positional arguments. We try our best to kill the server, but in some cases the PID changes and the whole test fail. Ignore the kill exit code and let the build agent to kill the process. Epic: none Fixes: RE-271 Release note: None Co-authored-by: Rail Aliiev <rail@iqchoice.com>
Previously, the bincheck test started failing after upgrading the linux build system to a newer GLIBC and after we changed the way we stop the server. The linux bincheck test was relying on running the cockroach binary in a qemu-based VM, using a custom kernel. Now that, we don't use rocksdb and don't have SSE issues (see cockroachdb#15589), we can simplify the test and run the binary without the VM layer. This PR also fixes the issue, where the busybox `kill` command can only handle positional arguments. We try our best to kill the server, but in some cases the PID changes and the whole test fail. Ignore the kill exit code and let the build agent to kill the process. Epic: none Fixes: RE-271 Release note: None
I am still getting same error which was mentioned in issue #14443
The text was updated successfully, but these errors were encountered: