Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation in Botan's SHA3 with optimization -O2 and -O3 #51299

Closed
llvmbot opened this issue Sep 24, 2021 · 15 comments
Closed

Miscompilation in Botan's SHA3 with optimization -O2 and -O3 #51299

llvmbot opened this issue Sep 24, 2021 · 15 comments
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category regression

Comments

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2021

Bugzilla Link 51957
Resolution FIXED
Resolved on Nov 10, 2021 09:52
Version 12.0
OS MacOS X
Blocks #50580 #51489
Reporter LLVM Bugzilla Contributor
CC @alexey-bataev,@anton-afanasyev,@davidbolvansky,@DimitryAndric,@RKSimon,@nikic,@zygoloid,@rotateright,@tstellar,@vtjnash,@wjristow
Fixed by commit(s) 173dd89 e27a6db 93edfb2 32bb956

Extended Description

Clang 12 and Apple Clang 13 seem to cause a miscompilation for optimization levels -O2 and -O3 in the C++ SHA3 implementation of the Botan library.

GitHub issue with detailed investigation steps and a minimal reproducer that doesn't depend on Botan:
randombit/botan#2802 (comment)

GitHub pull request with a workaround:
randombit/botan#2803

In the reproducing program, wrong results seem to surface in line 3 of SHA3_round():

const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22];

for the given input values, the result in C2 would be too large to fit into a 64-bit signed integer. FWIW.

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 24, 2021

Can we get a bisect on this?

@DimitryAndric
Copy link
Collaborator

The essential items to reproduce are that the target is x86_64, and -mavx is enabled. I did a bisect and found that the assertions are fixed by (or as a side-effect of) 0d74fd3 ("[SLP][COST][X86]Improve cost model for masked gather").

I had to slightly modify the reproducer to make it compile, adding :

// clang++ -std=gnu++17 -O3 -mavx pr51957.cpp -o pr51957 && ./pr51957

#include
#include
#include

template<size_t ROT, typename T>
inline constexpr T rotl(T input)
{
static_assert(ROT > 0 && ROT < 8sizeof(T), "Invalid rotation constant");
return static_cast((input << ROT) | (input >> (8
sizeof(T) - ROT)));
}

attribute((noinline))
void SHA3_round(uint64_t T[25], const uint64_t A[25], uint64_t RC)
{
const uint64_t C0 = A[0] ^ A[5] ^ A[10] ^ A[15] ^ A[20];
const uint64_t C1 = A[1] ^ A[6] ^ A[11] ^ A[16] ^ A[21];

// the calculation of C2 fails for -O3 or -O2 with clang 12
// FWIW: it would produce a value that doesn't fit into a signed 64-bit int
const uint64_t C2 = A[2] ^ A[7] ^ A[12] ^ A[17] ^ A[22];

const uint64_t C3 = A[3] ^ A[8] ^ A[13] ^ A[18] ^ A[23];
const uint64_t C4 = A[4] ^ A[9] ^ A[14] ^ A[19] ^ A[24];

const uint64_t D0 = rotl<1>(C0) ^ C3;
const uint64_t D1 = rotl<1>(C1) ^ C4;
const uint64_t D2 = rotl<1>(C2) ^ C0;
const uint64_t D3 = rotl<1>(C3) ^ C1;
const uint64_t D4 = rotl<1>(C4) ^ C2;

const uint64_t B00 = A[ 0] ^ D1;
const uint64_t B01 = rotl<44>(A[ 6] ^ D2);
const uint64_t B02 = rotl<43>(A[12] ^ D3);
const uint64_t B03 = rotl<21>(A[18] ^ D4);
const uint64_t B04 = rotl<14>(A[24] ^ D0);
T[ 0] = B00 ^ (~B01 & B02) ^ RC;
T[ 1] = B01 ^ (~B02 & B03);
T[ 2] = B02 ^ (~B03 & B04);
T[ 3] = B03 ^ (~B04 & B00);
T[ 4] = B04 ^ (~B00 & B01);

const uint64_t B05 = rotl<28>(A[ 3] ^ D4);
const uint64_t B06 = rotl<20>(A[ 9] ^ D0);
const uint64_t B07 = rotl< 3>(A[10] ^ D1);
const uint64_t B08 = rotl<45>(A[16] ^ D2);
const uint64_t B09 = rotl<61>(A[22] ^ D3);
T[ 5] = B05 ^ (~B06 & B07);
T[ 6] = B06 ^ (~B07 & B08);
T[ 7] = B07 ^ (~B08 & B09);
T[ 8] = B08 ^ (~B09 & B05);
T[ 9] = B09 ^ (~B05 & B06);

// --- instructions starting from here can be removed
// and the -O3 dicrepancy is still triggered

const uint64_t B10 = rotl< 1>(A[ 1] ^ D2);
const uint64_t B11 = rotl< 6>(A[ 7] ^ D3);
const uint64_t B12 = rotl<25>(A[13] ^ D4);
const uint64_t B13 = rotl< 8>(A[19] ^ D0);
const uint64_t B14 = rotl<18>(A[20] ^ D1);
T[10] = B10 ^ (~B11 & B12);
T[11] = B11 ^ (~B12 & B13);
T[12] = B12 ^ (~B13 & B14);
T[13] = B13 ^ (~B14 & B10);
T[14] = B14 ^ (~B10 & B11);

const uint64_t B15 = rotl<27>(A[ 4] ^ D0);
const uint64_t B16 = rotl<36>(A[ 5] ^ D1);
const uint64_t B17 = rotl<10>(A[11] ^ D2);
const uint64_t B18 = rotl<15>(A[17] ^ D3);
const uint64_t B19 = rotl<56>(A[23] ^ D4);
T[15] = B15 ^ (~B16 & B17);
T[16] = B16 ^ (~B17 & B18);
T[17] = B17 ^ (~B18 & B19);
T[18] = B18 ^ (~B19 & B15);
T[19] = B19 ^ (~B15 & B16);

const uint64_t B20 = rotl<62>(A[ 2] ^ D3);
const uint64_t B21 = rotl<55>(A[ 8] ^ D4);
const uint64_t B22 = rotl<39>(A[14] ^ D0);
const uint64_t B23 = rotl<41>(A[15] ^ D1);
const uint64_t B24 = rotl< 2>(A[21] ^ D2);
T[20] = B20 ^ (~B21 & B22);
T[21] = B21 ^ (~B22 & B23);
T[22] = B22 ^ (~B23 & B24);
T[23] = B23 ^ (~B24 & B20);
T[24] = B24 ^ (~B20 & B21);
}

int main()
{
uint64_t T[25];

uint64_t A[25] = {
    15515230172486u, 9751542238472685244u, 220181482233372672u,
    2303197730119u, 9537012007446913720u, 0u, 14782389640143539577u,
    2305843009213693952u, 1056340403235818873u, 16396894922196123648u,
    13438274300558u, 3440198220943040u, 0u, 3435902021559310u, 64u,
    14313837075027532897u, 32768u, 6880396441885696u, 14320469711924527201u,
    0u, 9814829303127743595u, 18014398509481984u, 14444556046857390455u,
    4611686018427387904u, 18041275058083100u };

SHA3_round(T, A, 0x0000000000008082);

assert(T[0]  == 16394434931424703552u);
assert(T[1]  == 10202638136074191489u);
assert(T[2]  == 6432602484395933614u);
assert(T[3]  == 10616058301262943899u);
assert(T[4]  == 14391824303596635982u);
assert(T[5]  == 5673590995284149638u);
assert(T[6]  == 15681872423764765508u);
assert(T[7]  == 11470206704342013341u);
assert(T[8]  == 8508807405493883168u);
assert(T[9]  == 9461805213344568570u);
assert(T[10] == 8792313850970105187u);
assert(T[11] == 13508586629627657374u);
assert(T[12] == 5157283382205130943u);
assert(T[13] == 375019647457809685u);
assert(T[14] == 9294608398083155963u);
assert(T[15] == 16923121173371064314u);
assert(T[16] == 4737739424553008030u);
assert(T[17] == 5823987023293412593u);
assert(T[18] == 13908063749137376267u);
assert(T[19] == 13781177305593198238u);
assert(T[20] == 9673833001659673401u);
assert(T[21] == 17282395057630454440u);
assert(T[22] == 12906624984756985556u);
assert(T[23] == 3081478361927354234u);
assert(T[24] == 93297594635310132u);

return 0;

}

But obviously this could be reduced much further. I'll also have a look at where this started going wrong, as e.g. clang 10.x didn't result in an assertion failure.

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 25, 2021

I have a suspicion that https://reviews.llvm.org/D106613 will fix this

@DimitryAndric
Copy link
Collaborator

Bisecting from llvm 10 onwards, it turns out that the problem got introduced (or only exposed :) in fcad8d3 ("[SLP] Make SLPVectorizer to use llvm.masked.gather intrinsic").

@vtjnash
Copy link
Member

vtjnash commented Sep 25, 2021

Seems quite similar. The code that failed for me was for xoshiro256, which looks like they do similar operations. Probably only got exposed by that commit, since the bad ExternalUser calls look like they were already present in that diff.

@davidbolvansky
Copy link
Collaborator

LLVM 13 was really released without fix for this important security issue?

@DimitryAndric
Copy link
Collaborator

LLVM 13 was really released without fix for this important security issue?

As I noted in comment 2, only llvm 12 gave bad output. From 2021-07-08 onwards it was fixed in llvm 13, as a side effect of 0d74fd3 .

That said, e27a6db fixes another related issue, but it should not affect the Botan code.

@tstellar
Copy link
Collaborator

That said,
https://github.com/llvm/llvm-project/commit/
e27a6db fixes another related issue, but it
should not affect the Botan code.

Should we backport this to the release branch?

@DimitryAndric
Copy link
Collaborator

That said,
https://github.com/llvm/llvm-project/commit/
e27a6db fixes another related issue, but it
should not affect the Botan code.

Should we backport this to the release branch?

I think it should be backported, since it fixes a possible out-of-bounds store. Even if the test case problem from botan is papered over by a seemingly unrelated commit (e.g. "Improve cost model"), it could still occur in other situations.

I hope Simon and Jameson agree. :)

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 22, 2021

That said,
https://github.com/llvm/llvm-project/commit/
e27a6db fixes another related issue, but it
should not affect the Botan code.

Should we backport this to the release branch?

I think it should be backported, since it fixes a possible out-of-bounds
store. Even if the test case problem from botan is papered over by a
seemingly unrelated commit (e.g. "Improve cost model"), it could still occur
in other situations.

I hope Simon and Jameson agree. :)

+1 for rGe27a6db5298f6ba3c1dbc8bab25c769cfa761b2a to be merged to 13.x

@anton-afanasyev
Copy link
Contributor

This should be definitely backported to 13.0.1 since "Improve cost model..." commit unfortunately veiled issue only partially. For instance, the bug is still reproduced on botan for clang-13.0.0 on skylake without avx512: https://godbolt.org/z/fWTEEs5e7

@tstellar
Copy link
Collaborator

tstellar commented Nov 8, 2021

The fix does not apply cleanly, could someone backport this and push a branch to their local github fork?

@vtjnash
Copy link
Member

vtjnash commented Nov 8, 2021

Just checked and it seems like e27a6db will backport cleanly if you include 173dd89 first.

@tstellar
Copy link
Collaborator

Merged: 32bb956

@tstellar
Copy link
Collaborator

mentioned in issue #51489

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category regression
Projects
None yet
Development

No branches or pull requests

7 participants