-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOS: botan built with Xcode-13 fails SHA-3 tests #2802
Comments
I can reproduce this on my Mac. Actually, its not only SHAKE tests but also others:
|
Same output for the current |
If I compile in debug mode (without optimizations) the SHA-3 tests work (with disabled BMI2 module). Results for other tests pending...
Edit: full test body (on
Edit2: without optimizations ( Edit3:
|
I concur.
What would happen if you omit In my case, where I have a "standard" It was my understanding that From the Clang docs:
|
I'm getting the same results (enabling optimization So, it's not probably not a bug introduced by Xcode-13. SHA3 code is the likelier culprit. I suspect we should try |
Dang! If I build with UBSan, the tests work okay:
|
Interesting. And there was no report from UBSAN, as far as I could see... Although I had to build with Macports Clang-12, because Xcode Clang failed to link with sanitizer (?!)...
|
Did you link with
|
Ah, sh**. Doch. No, I did not. :-( Rebuilding... |
OK, I've rebuilt as you showed (and with optimization!) - and immediately got ASAN errors:
Here's my config:
Here's build: make-out.txt And the tests output (brief now ;): test-out.txt |
Alright, I think I'm digging much deeper than I should. Nevertheless, some printf-debugging suggests that the optimization breaks either the for-loop in I instrumented said for-loop like so: // helper function
void print_array(uint64_t S[25])
{
for (unsigned int i = 0; i < 25; ++i)
{
std::cout << i << ": " << S[i] << '\n';
}
std::cout << '\n';
}
// loop in SHA_3::permute()
for(size_t i = 0; i != 24; i += 2)
{
SHA3_round(T, A, RC[i+0]);
std::cout << i << '\n';
print_array(T);
SHA3_round(A, T, RC[i+1]);
std::cout << i << '\n';
print_array(A);
} Curiously, the first invocation of
While the second invocation of
Now, frankly, I find this rather strange. |
Turns out: After moving |
Okay, this is becoming a goose chase, I'm afraid. I compared the input values in Enough for today... |
Here's a minimal example (that doesn't depend on Botan), reproducing the discrepancy. With #include <cstdint>
#include <cassert>
template<size_t ROT, typename T>
inline constexpr T rotl(T input)
{
static_assert(ROT > 0 && ROT < 8*sizeof(T), "Invalid rotation constant");
return static_cast<T>((input << ROT) | (input >> (8*sizeof(T) - ROT)));
}
inline void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC)
{
const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20];
const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21];
// the calculation of C2 fails for -O3 or -O2 with clang 12
// FWIW: it would produce a value that doesn't fit into a _signed_ 64-bit int
const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22];
const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23];
const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24];
const uint64_t D0 = rotl<1>(C0) ^ C3;
const uint64_t D1 = rotl<1>(C1) ^ C4;
const uint64_t D2 = rotl<1>(C2) ^ C0;
const uint64_t D3 = rotl<1>(C3) ^ C1;
const uint64_t D4 = rotl<1>(C4) ^ C2;
const uint64_t B00 = A[ 0] ^ D1;
const uint64_t B01 = rotl<44>(A[ 6] ^ D2);
const uint64_t B02 = rotl<43>(A[12] ^ D3);
const uint64_t B03 = rotl<21>(A[18] ^ D4);
const uint64_t B04 = rotl<14>(A[24] ^ D0);
T[ 0] = B00 ^ (~B01 & B02) ^ RC;
T[ 1] = B01 ^ (~B02 & B03);
T[ 2] = B02 ^ (~B03 & B04);
T[ 3] = B03 ^ (~B04 & B00);
T[ 4] = B04 ^ (~B00 & B01);
const uint64_t B05 = rotl<28>(A[ 3] ^ D4);
const uint64_t B06 = rotl<20>(A[ 9] ^ D0);
const uint64_t B07 = rotl< 3>(A[10] ^ D1);
const uint64_t B08 = rotl<45>(A[16] ^ D2);
const uint64_t B09 = rotl<61>(A[22] ^ D3);
T[ 5] = B05 ^ (~B06 & B07);
T[ 6] = B06 ^ (~B07 & B08);
T[ 7] = B07 ^ (~B08 & B09);
T[ 8] = B08 ^ (~B09 & B05);
T[ 9] = B09 ^ (~B05 & B06);
// --- instructions starting from here can be removed
// and the -O3 dicrepancy is still triggered
const uint64_t B10 = rotl< 1>(A[ 1] ^ D2);
const uint64_t B11 = rotl< 6>(A[ 7] ^ D3);
const uint64_t B12 = rotl<25>(A[13] ^ D4);
const uint64_t B13 = rotl< 8>(A[19] ^ D0);
const uint64_t B14 = rotl<18>(A[20] ^ D1);
T[10] = B10 ^ (~B11 & B12);
T[11] = B11 ^ (~B12 & B13);
T[12] = B12 ^ (~B13 & B14);
T[13] = B13 ^ (~B14 & B10);
T[14] = B14 ^ (~B10 & B11);
const uint64_t B15 = rotl<27>(A[ 4] ^ D0);
const uint64_t B16 = rotl<36>(A[ 5] ^ D1);
const uint64_t B17 = rotl<10>(A[11] ^ D2);
const uint64_t B18 = rotl<15>(A[17] ^ D3);
const uint64_t B19 = rotl<56>(A[23] ^ D4);
T[15] = B15 ^ (~B16 & B17);
T[16] = B16 ^ (~B17 & B18);
T[17] = B17 ^ (~B18 & B19);
T[18] = B18 ^ (~B19 & B15);
T[19] = B19 ^ (~B15 & B16);
const uint64_t B20 = rotl<62>(A[ 2] ^ D3);
const uint64_t B21 = rotl<55>(A[ 8] ^ D4);
const uint64_t B22 = rotl<39>(A[14] ^ D0);
const uint64_t B23 = rotl<41>(A[15] ^ D1);
const uint64_t B24 = rotl< 2>(A[21] ^ D2);
T[20] = B20 ^ (~B21 & B22);
T[21] = B21 ^ (~B22 & B23);
T[22] = B22 ^ (~B23 & B24);
T[23] = B23 ^ (~B24 & B20);
T[24] = B24 ^ (~B20 & B21);
}
int main()
{
uint64_t T[25];
uint64_t A[25] = {
15515230172486u, 9751542238472685244u, 220181482233372672u,
2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u,
2305843009213693952u, 1056340403235818873u, 16396894922196123648u,
13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u,
14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u,
0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u,
4611686018427387904u, 18041275058083100u };
SHA3_round(T, A, 0x0000000000008082);
assert(T[0] == 16394434931424703552u);
assert(T[1] == 10202638136074191489u);
assert(T[2] == 6432602484395933614u);
assert(T[3] == 10616058301262943899u);
assert(T[4] == 14391824303596635982u);
assert(T[5] == 5673590995284149638u);
assert(T[6] == 15681872423764765508u);
assert(T[7] == 11470206704342013341u);
assert(T[8] == 8508807405493883168u);
assert(T[9] == 9461805213344568570u);
assert(T[10] == 8792313850970105187u);
assert(T[11] == 13508586629627657374u);
assert(T[12] == 5157283382205130943u);
assert(T[13] == 375019647457809685u);
assert(T[14] == 9294608398083155963u);
assert(T[15] == 16923121173371064314u);
assert(T[16] == 4737739424553008030u);
assert(T[17] == 5823987023293412593u);
assert(T[18] == 13908063749137376267u);
assert(T[19] == 13781177305593198238u);
assert(T[20] == 9673833001659673401u);
assert(T[21] == 17282395057630454440u);
assert(T[22] == 12906624984756985556u);
assert(T[23] == 3081478361927354234u);
assert(T[24] == 93297594635310132u);
return 0;
} |
Great investigation! Also, note that I'm getting the same results with another Clang-12 compilers, not only Xcode. But Clang-11 from Macports seems to compile and run your reproducer correctly. |
FWIW I cannot reproduce the issue with Clang 12 on Linux. |
🎉🎉 |
For reference, bug report to llvm.org: https://bugs.llvm.org/show_bug.cgi?id=51957 |
I confirm that it works on MacOS-11.6 with Xcode-13 and Thank you! Let's merge, and back-port to |
Interesting, I cannot reproduce this with Apple clang version 13.0.0 (clang-1300.0.29.3), nor with FreeBSD clang 13.0.0-rc3-8-g08642a395f23. I.e., none of the assertions trigger. (Note that Apple's versions don't exactly correspond to upstream LLVM versions.) Do you know of any non-Mac clang version that miscompiles the test case, and what exact compile flags are being used? |
I suspect it's the combination of the "right" compiler and CPU (see below). I know that at least two independent developers here reproduced the problem, with Xcode-13 (Apple clang version 13.0.0 (clang-1300.0.29.3)), and with Macports clang version 12.0.1 (Macports clang version 11.1.0 is "immune", as were previous releases). I also know that LLVM developers confirmed the issue, and bisected it to SLP Vectorizer. @reneme 's reproducer "reproduces" the problem with practically no flags at all:
Note: without
The problem obviously is (also?) CPU-related, because "downgrading" CPU arch to, e.g.,
|
Ah yes, I failed to mention that I was compiling this on a M1 Mac, so targeting arm64. As a data point, that works correctly in any case, both for the minimized test case, and when I do a full configure/make/make check.
Right, so what is interesting is which CPU it detects natively, and therefore which CPU extensions it enables. Can you run these same clang commands with |
Did I mention that the problem was bisected to the SLP Vectorizer in LLVM? Regardless,
Of course, the above doesn't mean the problem is limited to |
Yes, but I'm not aware of any particular LLVM commit that caused it; there isn't any further information in https://bugs.llvm.org/show_bug.cgi?id=51957, maybe you got that via other channels? The main thing is whether this is an Apple specific change that broke something, or whether it is also reproducible in vanilla LLVM, if you specify e.g. |
The problem disappears with On Linux, Clang-12 appears to not set
|
Note that the original poster of this bug used |
Yes, I did. ;-)
Yes, it was - but so was the machine that Linux runs on (although it is in a VM ;). Regardless, it looks like we have now the exact Intel extension (SSE4.1) that LLVM Clang-12 fails to properly generate optimized code for on |
I merged a commit upstream on Monday that may fix this: https://reviews.llvm.org/D106613 |
For what it's worth: The |
Fixed now in master and in 2.18.2 thanks for reporting this @mouse07410. @reneme + @hrantzsch thanks for digging in and finding the problem. And also thank you to @vtjnash for fixing the Clang bug! |
Apple released Xcode-13. When Botan is built with it, it fails tests. Both
master
andrelease-2
, in the same way. On all my Macs.Branch
release-2
Configuration
Configuration output:
Build output: make-out.txt.gz
Tests output
Full output: make-out.txt.gz
Branch
master
Configuration
Output:
Build output: make-out.txt.gz
Tests
Full output: test-out.txt.gz
The text was updated successfully, but these errors were encountered: