Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Run ASan/LSan and reorganize sanitizer and Valgrind jobs #846

Merged
merged 4 commits into from
May 21, 2021

Conversation

real-or-random
Copy link
Contributor

@real-or-random real-or-random commented Nov 9, 2020

This enables asan (including lsan), and restructures the sanitizer builds. It also removes -fno-omit-frame-pointer again. This is debatable. The main reason for removing this is that GCC (but not clang) fails to build (runs out of registers) when combining asan, -fno-omit-frame-pointer, and the x86_64 asm. But I believe without -fno-omit-frame-pointer, the instrumented binary may be closer to the real binary, which is good. If we get unusable debugging output on CI without frame pointers, we can still reproduce the build locally.

  • ECMULTGENPRECISION=8 : We should disable this entirely, not only on CI, because it's slow and needs too much stack space

@real-or-random real-or-random force-pushed the 202010_ci_matrix branch 2 times, most recently from 5716133 to 9e55f58 Compare November 12, 2020 12:49
@real-or-random
Copy link
Contributor Author

real-or-random commented Nov 12, 2020

A few things are weird on MacOS, and I don't really understand what's going on:

  • jobs 49 and 50 fail because lsan does not work with Apple's clang, see config.log. That's expected according to Mac OS X Sierra: detect_leaks is not supported on this platform google/sanitizers#1026
  • jobs 67 succeeds but 68 fails, with the same error message as above. But why the difference? I have no idea, they should use the same toolchain etc. Is this related to gmp? But 67 seem to fail with the "native compiler", and we don't set $CC_FOR_BUILD correctly for gcc. I've added this now.
  • I'm not convinced that the "gcc" jobs (including 67, 68) really use gcc. It's again Apple's fake gcc according to the $CC --version in after_script. @elichai I added a $CC --version in the script, let's see if this makes a difference. (This will be obsolete anyway when we switch away from travis).

And job 32 times out... well that's not nice but at least I understand what's going on ...

@elichai
Copy link
Contributor

elichai commented Nov 12, 2020

  • I'm not convinced that the "gcc" jobs (including 67, 68) really use gcc. It's again Apple's fake gcc according to the $CC --version in after_script. @elichai I added a $CC --version in the script, let's see if this makes a difference. (This will be obsolete anyway when we switch away from travis).

IIRC that's why I did CC=gcc-9 because OSX has an alias for gcc=clang
https://github.com/bitcoin-core/secp256k1/pull/750/files#diff-5c41b2883cfdc97e6b8a07ebe6456643de0160099b7f15ed04afeb2fd627f630R14

if [ "$TRAVIS_OS_NAME" = "osx" ] && [ "$TRAVIS_COMPILER" = "gcc" ]
then
export CC="gcc-9"
fi

@real-or-random
Copy link
Contributor Author

IIRC that's why I did CC=gcc-9 because OSX has an alias for gcc=clang

Apparently that does not seem to work (anymore?).

I think we shouldn't try to fix this now but first move to another CI platform. For example, Cirrus seems to ship with GCC cirruslabs/cirrus-ci-docs#67 .

@real-or-random
Copy link
Contributor Author

Rebased to work with Cirrus. I also updated the initial comment.

@real-or-random real-or-random marked this pull request as ready for review May 14, 2021 11:57
@real-or-random real-or-random changed the title Enable more sanitizers on Travis ci: Run asan and add more sanitizer/valgrind jobs May 14, 2021
@real-or-random real-or-random force-pushed the 202010_ci_matrix branch 5 times, most recently from 4087f78 to 1191bb0 Compare May 17, 2021 18:42
@real-or-random real-or-random changed the title ci: Run asan and add more sanitizer/valgrind jobs ci: Run ASan/LSan and reorganize sanitizer and Valgrind jobs May 17, 2021
@sipa
Copy link
Contributor

sipa commented May 18, 2021

utACK 1191bb0

.cirrus.yml Outdated Show resolved Hide resolved
.cirrus.yml Outdated
WRAPPER_CMD: "valgrind --error-exitcode=42"
- name: "UBSan, ASan, LSan"
env:
# Use dummy wrapper command to run the tests outside of make check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, what's the reason for not running this (and valgrind) in make check? If it was just about being able to set the test "count"/TEST_ITERS separately then we do have an env var for this now, and could remove it like this https://github.com/jonasnick/secp256k1/commits/202010_ci_matrix-jn (untested).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just that make check will try to run without the wrappers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean make check is run without a wrapper even when a wrapper is set? Afaics no, because we unset BUILD whenever we set WRAPPER_CMD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue (and I think this is what Pieter is partly saying) is that ${WRAPPER_CMD} make check typically won't work for all wrapper commands.

  • valgrind make check will memcheck make but will not memcheck our tests because they'll be child processes. (Now you could try valgrind --trace-children but then it will still additionally memcheck make which gives false positives and wastes CI time.)
  • wine make check just won't work at all because make is not a Windows executable.
  • I haven't tried QEMU but I expect the same as for Wine (because we cross-build).

What we could is to tell make to use the wrapper command. Now that I'm reading up on this again, I recall that it's not even more work to do this. There's the (obscurely named) LOG_COMPILER env variable, which does exactly this. You can even pass LOG_COMPILER="./libtool --mode=execute valgrind" etc, which we don't need now but maybe in the future. Running all tests using make check will have the benefit that the logs always end up in the same place in the CI output. Let me give it a shot.

And thanks for reminding me of SECP256K1_TEST_ITERS. When I was writing this, I didn't know that this exists (I reinvented the name TEST_ITERS). So we should simply set this variable instead, like we do for the benchmarks. This has the nice side effect that it would apply to the ordinary make check builds if we'd like to set it there for some reason.

@real-or-random
Copy link
Contributor Author

real-or-random commented May 21, 2021

updated as described above

edit: pushed again, I had forgotten the use_aligned=1.

@real-or-random real-or-random force-pushed the 202010_ci_matrix branch 2 times, most recently from 222a563 to 22e9804 Compare May 21, 2021 10:34
ci/cirrus.sh Outdated Show resolved Hide resolved
ci/cirrus.sh Outdated Show resolved Hide resolved
@real-or-random
Copy link
Contributor Author

Hm, it's still timing out. But everything looks correct including the test iterations (like in the revision yesterday, when the longest job took ~ 40 min.) If I'm not mistaken, the only difference to now is that we run the tests in make check and this means the only real difference is that they run concurrently with -j2 and we allocate only one CPU. I pushed a commit to see if this fixes the timeout.

@real-or-random
Copy link
Contributor Author

Ok, ignore my last comment. The tests were running twice because I screwed up when rebasing. I hope it's really ready now...

Copy link
Contributor

@jonasnick jonasnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

ACK 02dcea1 spot-checked ci output, checked that when valgrind ./tests crashes then LOG_COMPILER=valgrind make check also crashes.

@sipa
Copy link
Contributor

sipa commented May 21, 2021

utACK 02dcea1

@jonasnick jonasnick merged commit 3dc8c07 into bitcoin-core:master May 21, 2021
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 9, 2023
Summary:
```
This enables asan (including lsan), and restructures the sanitizer builds. It also removes -fno-omit-frame-pointer again. This is debatable. The main reason for removing this is that GCC (but not clang) fails to build (runs out of registers) when combining asan, -fno-omit-frame-pointer, and the x86_64 asm. But I believe without -fno-omit-frame-pointer, the instrumented binary may be closer to the real binary, which is good. If we get unusable debugging output on CI without frame pointers, we can still reproduce the build locally.
```

Backport of [[bitcoin-core/secp256k1#846 | secp256k1#846]].
Completes backport of [[bitcoin-core/secp256k1#969 | secp256k1#969]]:
bitcoin-core/secp256k1@3d2f492

Depends on D12957.

Test Plan:
  ninja check-secp256k1
Tested the cirrus build against my personal github forked repo.

Reviewers: #bitcoin_abc, sdulfari

Reviewed By: #bitcoin_abc, sdulfari

Subscribers: sdulfari

Differential Revision: https://reviews.bitcoinabc.org/D12960
deadalnix pushed a commit to Bitcoin-ABC/secp256k1 that referenced this pull request Jan 10, 2023
Summary:
```
This enables asan (including lsan), and restructures the sanitizer builds. It also removes -fno-omit-frame-pointer again. This is debatable. The main reason for removing this is that GCC (but not clang) fails to build (runs out of registers) when combining asan, -fno-omit-frame-pointer, and the x86_64 asm. But I believe without -fno-omit-frame-pointer, the instrumented binary may be closer to the real binary, which is good. If we get unusable debugging output on CI without frame pointers, we can still reproduce the build locally.
```

Backport of [[bitcoin-core/secp256k1#846 | secp256k1#846]].
Completes backport of [[bitcoin-core/secp256k1#969 | secp256k1#969]]:
bitcoin-core/secp256k1@3d2f492

Depends on D12957.

Test Plan:
  ninja check-secp256k1
Tested the cirrus build against my personal github forked repo.

Reviewers: #bitcoin_abc, sdulfari

Reviewed By: #bitcoin_abc, sdulfari

Subscribers: sdulfari

Differential Revision: https://reviews.bitcoinabc.org/D12960
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants