Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add miniapps as tests #1112

Merged
merged 39 commits into from
Dec 10, 2024
Merged

Add miniapps as tests #1112

merged 39 commits into from
Dec 10, 2024

Conversation

msimberg
Copy link
Collaborator

@msimberg msimberg commented Mar 19, 2024

Adds the algorithms miniapps as tests. Does not add miniapp_laset as it's not really testing anything too useful, and it messes with the check-threads setup (which checks if OS threads are being spawned that shouldn't be spawned). The miniapps are run with 6 ranks, --grid-rows 3 --grid-cols 2, and otherwise default options.

This splits DLAF_addTest into two separate functions: one which adds a test given a target (DLAF_addTargetTest) which is used to add the miniapps since they already have CMake targets, and DLAF_addTest which creates an executable and then calls DLAF_addTargetTest. The behaviour of DLAF_addTest remains unchanged.

This also adds a CATEGORY parameter that can be added to tests as a label. This is added alongside the RANK_* labels. The CATEGORY defaults to UNIT for "unit tests", and I set it to MINIAPP for the miniapps. I'm fine with making the choice explicit and/or renaming/adding the categories. I then use the RANK and CATEGORY labels to generate jobs the same way RANK is used currently on master. Combinations that have no tests are not added as CI jobs.

Finally, many miniapps were hanging on the CUDA configurations, where they run with only one worker thread and one MPI thread. I've changed the waitLocalTiles calls to pika::wait calls before MPI_Barrier is called to make sure that really nothing is running anymore and deadlocks are avoided. I've only changed them to pika::wait after the algorithms are called, but for consistency it might make sense to use pika::wait anywhere before MPI_Barrier is called?

@msimberg msimberg self-assigned this Mar 19, 2024
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg force-pushed the miniapps-as-tests branch from b395f89 to 9900e70 Compare March 20, 2024 08:35
@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg force-pushed the miniapps-as-tests branch from 7619183 to 7a85649 Compare March 20, 2024 11:18
@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg force-pushed the miniapps-as-tests branch 2 times, most recently from 7041343 to ed29663 Compare March 21, 2024 09:26
@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

msimberg commented Apr 8, 2024

cscs-ci run

@msimberg msimberg force-pushed the miniapps-as-tests branch 3 times, most recently from 62b54e9 to f7d52d6 Compare April 9, 2024 08:03
@msimberg
Copy link
Collaborator Author

msimberg commented Apr 9, 2024

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

msimberg commented Apr 9, 2024

cscs-ci run

@msimberg msimberg force-pushed the miniapps-as-tests branch 2 times, most recently from 8533a59 to 71eac9d Compare April 9, 2024 12:40
@msimberg
Copy link
Collaborator Author

msimberg commented Apr 9, 2024

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg requested review from rasolca, albestro and RMeli April 15, 2024 10:58
@msimberg msimberg marked this pull request as ready for review April 15, 2024 10:58
ci/ctest_to_gitlab.sh Outdated Show resolved Hide resolved
@msimberg
Copy link
Collaborator Author

cscs-ci run

Copy link
Collaborator

@rasolca rasolca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not happy about spreading pika::waits everywhere in the miniapps.
I'd prefer another solution.
Would scheduling and sync_waiting an Ibarrier work?

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg mentioned this pull request Nov 26, 2024
4 tasks
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@albestro
Copy link
Collaborator

albestro commented Dec 3, 2024

I was experiencing hangs on miniapp_eigensolver just after the new transform_mpi #1125. I don't think that's the direct cause of them, but it seems like it exposed a problem.

I was able to replicate the hang on both daint (with 8 nodes), and on todi (with 4 and 8 nodes). I did some basic investigation, and miniapp check was the discriminant between hang (enabled) or no hang (disabled). Actually in one of my tests I tried disabling just hermitian_multiplication communications and it was enough to not make it hang anymore.

As suggested by @msimberg talking about this, I tried adding pika::wait() in the dlaf::finalize() and actually that resolved hangs in my problem reproducing configurations. Moreover, I opted for giving a try to wait_all_communicators changes (alone) from this PR, and also them are enough to fix hangs in my test-case.

@msimberg
Copy link
Collaborator Author

msimberg commented Dec 9, 2024

cscs-ci run

@rasolca rasolca merged commit e0d2619 into eth-cscs:master Dec 10, 2024
5 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants