-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad pe and aprun configurations for PET test case on Titan #1401
Comments
@philipwjones , I was unable to duplicate this error. I added a unit test to aprun:
I also tried to run PET_Ln9.T62_oQU240.GMPAS-NYF.titan_pgi manually on titan and I never made it to the single-thread case because the threaded case failed. |
@jgfouca Hmmm... mine definitely makes it through the first threaded test and then fails with the error below and a clearly wrong aprun. (Note that the error message itself doesn't show up in the log message - reported in a different issue - but I did manage to track it down to a bad resource request) The env_mach_pes.xml looks correct in that directory. Is there a chance the second case is inadvertently carrying over part of the environment from the threaded case (the -S 4 was appropriate for the 2-thread run)? Let me know if there is something I can do to help debug...we're trying to get some threading mods into MPAS and the tests complete fine on Edison but would like to verify on Titan since that's where some thread bugs showed up before. Exception during run: |
@philipwjones yes, given what you've said there's a good chance something is carrying-over from the threaded case into the serial case. I'll continue to look at it. |
Fixed in ESMCI with this: ESMCI/cime#1414 |
…ewcase Add input-dir to create_newcase Test suite: code-checker Test baseline: Test namelist changes: Test status: bit for bit Fixes #1303 User interface changes?: None Code review: @jedwards4b
On Titan:
./create_test PET_Ln9.T62_oQU240.GMPAS-NYF.titan_pgi
creates both a 2-threaded case and a single-threaded case.
In the single-thread case directory, env_mach_pes.xml is correct for a 64-task MPI run with 1 thread/task, but the aprun command generated is:
aprun -S 4 -n 64 -N 8 -d 1
which has a bad -S flag (should be -S8 for a 4-node run)
causing the single-thread case to fail.
In the 2-threaded case, the env_mach_pes.xml contains bad NTASKS values. It should be creating a 4-node case (64-core) with 32 MPI tasks and 2 threads per task, but env_mach_pes.xml has 64 set for NTASKS in all components. This can be manually changed and reset, but the defaults are wrong for this case.
The text was updated successfully, but these errors were encountered: