-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Thread Sanitizer #1824
Add Thread Sanitizer #1824
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1824 +/- ##
==========================================
- Coverage 75.68% 75.46% -0.22%
==========================================
Files 375 375
Lines 14565 14594 +29
Branches 2067 2070 +3
==========================================
- Hits 11024 11014 -10
- Misses 2910 2950 +40
+ Partials 631 630 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
.github/workflows/build-test.yml
Outdated
- name: Run Address Sanitizer | ||
run: ./tools/ci/build-test-macos-with-sanitizers.sh asan | ||
- name: Run Thread Sanitizer | ||
run: ./tools/ci/build-test-macos-with-sanitizers.sh tsan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibrhmkuru The current OS for Mac is using macOS 11.7 with Clang 13 as default toolchain.
On MacOS 12 we have Clang 14 installed: https://github.com/actions/runner-images/blob/main/images/macos/macos-12-Readme.md
Would you test that on MacOS 12? You just need to change this statement above: runs-on: macos-latest
to runs-on: macos-12
. And let's see about the differences.
@ibrhmkuru I did not yet look at the changes but would it be possible to split the asan/ubsan and tsan CI jobs? They can run in parallel and decrease the CI time. The clang based job runs extremely long on ubuntu. Have you done some research about what could be the reason. @elfenpiff since we might end up with additional three CI jobs, do you think it would be sufficient to run only one tsan job, e.g. clang on ubuntu? cc @dkroenke |
@elBoberido We can split the asan and tsan jobs. For long run time, I even saw more than 5 hours of run time (see : https://github.com/eclipse-iceoryx/iceoryx/actions/runs/3786509420) This job cancelled after time out of 5 hours expired :) This happened only for We realized that the test For thread sanitizer job, we can filter this test but I don't know how important this test and if it makes sense to filter this test for thread sanitizer. If you confirm I can filter it and I can split asan and tsan jobs and run thread sanitizer only with |
@ibrhmkuru I think splitting the jobs to asan and tsan makes sense. Same with skipping the test that causes the long run time. Regarding running only the clang based toolchain, @elfenpiff do you agree to run only clang on ubuntu for now? cc @dkroenke |
@elBoberido yes no problem. The only restriction I would have is that it runs on the newest clang compiler. |
Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
f40ff85
to
4b8d3c6
Compare
…le instead Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
@ibrhmkuru you mentioned that there is one tests which takes extremely long with the thread sanitizer enabled. Would it be possible to disable that test when the thread sanitizer runs? An intermediate solution would also be to run hoofs tests only and take care of posh in a follow up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Only a minor change ... and maybe a major one regarding the long test run with tsan enabled.
tools/run_tests.sh
Outdated
@@ -65,6 +69,10 @@ for arg in "$@"; do | |||
ASAN_ONLY=true | |||
TEST_SCOPE="no_timing_test" | |||
;; | |||
"tsan-only") | |||
CONTINUE_ON_ERROR=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed? If I understood this correctly then the CI would hide detected errors, right? This should be done explicitly by the continue-on-error
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elBoberido You're right. Done!
@elBoberido This was in In |
Well, this triples our CI time. I had a brief look at the tests and it seems some are quite fast, e.g. 1ms and others take 7s and more. My gut feeling is that the integration tests take quite long with the start of the RouDiEnvironment. Okay, the TheadSanitizer test run just finished after 52 minutes but the job somehow hangs. Let's see how long it takes to fully finish. |
Okay, my gut feelings were wrong. It seems to be related to DEATH_TESTS or error handler calls, e.g. |
.github/workflows/build-test.yml
Outdated
build-test-ubuntu-with-thread-sanitizer-clang-latest: | ||
# prevent stuck jobs consuming runners for 3 hours | ||
timeout-minutes: 180 | ||
runs-on: ubuntu-latest | ||
needs: pre-flight-check | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
- name: Install iceoryx dependencies and clang-tidy | ||
uses: ./.github/actions/install-iceoryx-deps-and-clang | ||
- name: Run Thread Sanitizer | ||
run: ./tools/ci/build-test-ubuntu-with-sanitizers.sh clang tsan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elBoberido @ibrhmkuru As mentioned in #1824 (comment) the CI run for the threadsanitizer takes about 50 mins. This is definitely too long for the Pull-Request Pipeline.
We have the following options:
- The execution time can be reduced on the job
- Move the job to this file: https://github.com/eclipse-iceoryx/iceoryx/blob/master/.github/workflows/lint_master.yml to make it only run on master
- Move the job to nigthly execution: https://github.com/eclipse-iceoryx/iceoryx/blob/master/.github/workflows/nightly.yml
I think option 2 is feasible for this job, would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- have a least the hoofs test run in the PR CI and the full job on master
- split the job into 4
- first job builds only
- the other three jobs run in parallel one of hoofs_moduletests, posh_moduletests or posh_integrationtests
I'm leaning towards 4 or 2 in the short term and 1 or 5 in the long term. Whichever is more feasible.
cc @ibrhmkuru
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can implement 5 :
- first job builds only
- the other three jobs run in parallel one of hoofs_moduletests, posh_moduletests or posh_integrationtests
As I checked, I need to use actions/upload-artifact
and actions/upload-artifact
to share test binaries between build job and test job.
cc @dkroenke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibrhmkuru would be great but since this PR is already in a good state I think we should go for option 2 in this PR and do 5 in a follow up. This has the advantage that someone can look at the warnings and check for false positives.
Btw. I did something like option 5 for iceoryx-rs
- https://github.com/eclipse-iceoryx/iceoryx-rs/tree/master/.github
- https://github.com/eclipse-iceoryx/iceoryx-rs/actions/runs/2728414816
Just in case you need some inspiration :)
cc @dkroenke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elBoberido Thanks for the example for inspiration :) I've done option 2 as suggested. It's ready for review.
cc @dkroenke
bd2ea9b
to
53d8d72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a small finding, everything else looks good, thanks!
.github/workflows/build-test.yml
Outdated
timeout-minutes: 60 | ||
runs-on: macos-latest | ||
runs-on: macos-12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use here macos-latest
, using version 12 was for testing purposes.
Within the next weeks the migration to MacOS 12 will be done: https://github.blog/changelog/2022-10-03-github-actions-jobs-running-on-macos-latest-are-now-running-on-macos-12/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkroenke Right. Done!
Do I need to squash the commits, or 5 commits for this PR is okay ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Usually we do not squash to 1 commit if there are meaningful intermediate steps and the commit messages look sensible. The only thing that could be improved are the two commit messages with Address review comments
. Usually we write what is done instead of referring just to review comments. Please keep this in mind for the next PR :)
…master branch Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
9f96934
to
f8e0f74
Compare
@elBoberido @dkroenke Thanks for information and feedback. I've squashed "Address review comments" commits. |
@ibrhmkuru One small hint: When the PR is ready and you want to ping the reviewers you can re-request the review by using the small symbol near to the reviewers name with the round arrows. |
@dkroenke Thanks! Do I need to do any additional things for merge? like assigning to marge-bot as in gitlab? |
No action point for you, the committer of the project merge the PR. |
Signed-off-by: Ibrahim Kuru ibrahim.kuru@apex.ai
Pre-Review Checklist for the PR Author
iox-123-this-is-a-branch
)iox-#123 commit text
)task-list-completed
)iceoryx_hoofs
are added to./clang-tidy-diff-scans.txt
Notes for Reviewer
What's been done :
ubuntu-clang
(not formacos
, nor forgcc
. They have been tried, but it takes too much time to run them)address-sanitizer
to make it run parallel and finish faster)run_tests.sh
withcontinue-on-error
. So, it will not exit immediately after first test failure, but run all the tests even it fails. This way we'll be able to see total number of thread sanitizer warnings.Checklist for the PR Reviewer
iceoryx_hoofs
have been added to./clang-tidy-diff-scans.txt
Post-review Checklist for the PR Author
References