Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple mpi-serial case on Casper failing in setup #143

Open
ekluzek opened this issue Jan 13, 2024 · 1 comment
Open

Simple mpi-serial case on Casper failing in setup #143

ekluzek opened this issue Jan 13, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ekluzek
Copy link
Contributor

ekluzek commented Jan 13, 2024

Hello, trying to run this test

SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc

With these externals in ctsm5.1.dev158

diff --git a/Externals.cfg b/Externals.cfg
index a17f8e2ec..b29af5c64 100644
--- a/Externals.cfg
+++ b/Externals.cfg
@@ -34,7 +34,7 @@ hash = 34723c2
 required = True
 
 [ccs_config]
-tag = ccs_config_cesm0.0.84
+tag = ccs_config_cesm0.0.87
 protocol = git
 repo_url = https://github.com/ESMCI/ccs_config_cesm.git
 local_path = ccs_config
@@ -44,11 +44,11 @@ required = True
 local_path = cime
 protocol = git
 repo_url = https://github.com/ESMCI/cime
-tag = cime6.0.175
+tag = cime6.0.198
 required = True
 
 [cmeps]
-tag = cmeps0.14.43
+tag = cmeps0.14.47
 protocol = git
 repo_url = https://github.com/ESCOMP/CMEPS.git
 local_path = components/cmeps

fails for me as follows.

qcmd -- ./create_test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc -r . 
Waiting on job launch; 9351378.casper-pbs with qsub arguments:
    qsub  -l select=1:ncpus=1:mem=10GB -A P93300606 -q casper@casper-pbs -l walltime=01:00:00

Warning: no access to tty (Inappropriate ioctl for device).
Thus no job control in this shell.
Testnames: ['SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc']
Using project from .cesm_proj: P93300041
create_test will do up to 1 tasks simultaneously
create_test will use up to 45 cores simultaneously
Creating test directory /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
RUNNING TESTS:
SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc
Starting CREATE_NEWCASE for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished CREATE_NEWCASE for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 186.715961 seconds (PASS)
Starting XML for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished XML for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 119.385811 seconds (PASS)
Starting SETUP for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished SETUP for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 1.602533 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
Errors were:
ERROR: module command /glade/u/apps/casper/23.10/spack/opt/spack/lmod/8.7.24/gcc/7.5.0/m4jx/lmod/lmod/libexec/lmod python load ncarenv/23.10 cmake/3.26.3 intel/2023.2.1 mkl/2023.2.0 netcdf/4.9.2 ncarcompilers/0.5.0 parallelio/2.6.2 esmf/8.5.0 ncarcompilers/1.0.0 failed with message:
Lmod has detected the following error: The following module(s) are unknown:
"ncarcompilers/0.5.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "ncarcompilers/0.5.0"

Also make sure that all modulefiles written in TCL start with the string
#%Module

Waiting for tests to finish
FAIL SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc (phase SETUP)
Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 380.3022334575653 seconds
casper-login1 cime/scripts> cd SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi/
Directory: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
casper-login1 scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi> cat TestStatus
PASS SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc CREATE_NEWCASE
PASS SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc XML
FAIL SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc SETUP
casper-login1 scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi> ./case.setup 
ERROR: module command /glade/u/apps/casper/23.10/spack/opt/spack/lmod/8.7.24/gcc/7.5.0/m4jx/lmod/lmod/libexec/lmod python load ncarenv/23.10 cmake/3.26.3 intel/2023.2.1 mkl/2023.2.0 netcdf/4.9.2 ncarcompilers/0.5.0 parallelio/2.6.2 esmf/8.5.0 ncarcompilers/1.0.0 failed with message:
Lmod has detected the following error: The following module(s) are unknown: "ncarcompilers/0.5.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "ncarcompilers/0.5.0"

Also make sure that all modulefiles written in TCL start with the string #%Module
@ekluzek ekluzek added the bug Something isn't working label Jan 13, 2024
@ekluzek
Copy link
Contributor Author

ekluzek commented Jan 13, 2024

We also saw this earlier with older externals as documented in CTSM:

ESCOMP/CTSM#2293

@ekluzek ekluzek changed the title Simple case on Casper failing in setup Simple mpi-serial case on Casper failing in setup Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant