Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only permit rank0 to mkdir when -d flag specified #955

Merged
merged 2 commits into from
Oct 13, 2023

Conversation

pdmullen
Copy link
Collaborator

@pdmullen pdmullen commented Oct 11, 2023

PR Summary

Evidently, @bprather and I are among the few who actually use the -d flag. This flag enables users to send all output data to a specified directory, e.g.,

mpiexec -n 99999999999 ./my_exe -i my_input -d /my_scratch/my_run

I have found on several occasions that when firing up a multi-rank job, I get a crash with an accompanying error message indicating that there was an error in creating the directory.

I think this issue is related to a race condition wherein multiple ranks were trying to create the same directory. The changes in this MR permit only rank 0 to create the directory. Then we call an MPI_Barrier in the unlikely chance that one rank tries to chdir into a directory not yet created.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

Copy link
Collaborator

@bprather bprather left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great. I should be able to pull tomorrow & test that it fixes the crash for me

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was the bug we suspected.

@Yurlungur Yurlungur merged commit 7395ed5 into develop Oct 13, 2023
49 checks passed
@pdmullen pdmullen deleted the pdmullen/fix-mkdir branch October 19, 2023 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants