Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unitrace crashes when using mpiexec #69

Open
flezaalv opened this issue Jun 19, 2024 · 2 comments
Open

unitrace crashes when using mpiexec #69

flezaalv opened this issue Jun 19, 2024 · 2 comments

Comments

@flezaalv
Copy link

I launched unitrace in a mpiexec command:

mpiexec -n 12 -ppn 12 --pmi=pmix ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test --output /home/test/test.csv python bin/sr.py

This is executed in a single node, 12 processes are created, but when they finishes I got this error from one process and the entire mpiexec fails:

hostname: rank 0 died from signal 15

I got this error in unitrace too #25, is this error the cause of signal 15?

@Sarbojit2019
Copy link
Contributor

  • Does app pass without Unitrace?
  • Does it fail even with smaller number of ranks?
  • Can you share the app and other details and help me to reproduce the issue locally?

@flezaalv
Copy link
Author

flezaalv commented Jun 26, 2024

  • Does app pass without Unitrace?
    Yes, it does, the app without Unitrace finishes with 0 return status.

  • Does it fail even with smaller number of ranks?
    I tested with mpiexec -n 2 -ppn 2 and get this error:

/run_mpi.sh: line 7: 169430 Segmentation fault      (core dumped) python bin/sr.py
[INFO] Log is stored in /home/test10/results.169391.0.csv
[INFO] Timeline is stored in /home/test10/run_mpi.sh.169391.0.json
hostname: rank 0 exited with code 139
hostname: rank 1 died from signal 15

The run_mpi.sh contains the entire app command. This is the mpiexec instruction with unitrace included:

mpiexec -n 2 -ppn 2 ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test10/ --output /home/test10/results.csv ./run_mpi.sh

Sure, I will share you more details.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants