Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash: unused buoyancy fluxes #195

Open
aekiss opened this issue Jul 22, 2024 · 11 comments
Open

crash: unused buoyancy fluxes #195

aekiss opened this issue Jul 22, 2024 · 11 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Jul 22, 2024

I tried a 1-year run of 1deg_jra55do_ryf with

DIABATIC_FIRST = False
DT = 1800.0
DT_THERM = 10800.0      ! 6*DT
THERMO_SPANS_COUPLING = True
DTBT_RESET_PERIOD = 10800.0

This timestepped through the full 12 months, then crashed with

FATAL from PE     0: ocean_model_restart was called with unused buoyancy fluxes.  For conservation, the ocean restart files can only be created after the buoyancy forcing is applied.

This occurs in the NUOPC cap.

I'm using

     ocn_cpl_dt =  3600
     stop_n = 1
     stop_option = nyears
@minghangli-uni
Copy link
Contributor

minghangli-uni commented Jul 22, 2024

Thanks @aekiss. I might need to check the source for details. Before that, I've done some quick tests with various combinations of dt, dt_therm, and dt_cpl under DIABATIC_FIRST = False and THERMO_SPANS_COUPLING = True. All the tests (1deg ryf) ran for 3 model days.

dt (s) dt_therm (s) dt_cpl (s) Result
1800 10800 (6*dt) 1800 (1*dt) Worked
1800 10800 (6*dt) 3600 (2*dt) Crashed
1800 10800 (6*dt) 5400 (3*dt) Worked
dt (s) dt_therm (s) dt_cpl (s) Result
1800 3600 (2*dt) 3600 (2*dt) Worked
1800 7200 (4*dt) 3600 (2*dt) Worked
1800 14400 (8*dt) 3600 (2*dt) Worked
dt (s) dt_therm (s) dt_cpl (s) Result Ref
1350 8100 (6*dt) 1350 (1*dt) Worked Current 0.25deg test
1350 8100 (6*dt) 2700 (2*dt) Crashed \
900 7200 (8*dt) 3600 (4*dt) Worked GFDL OM5&OM4 0.25deg

It appears that the model only crashes when the ratio dt:dt_therm:dt_cpl is 1:6:2.

NB:

     ocn_cpl_dt =  3600 
     stop_n = 1
     stop_option = nyears

@aekiss
Copy link
Contributor Author

aekiss commented Jul 23, 2024

Great, thanks @minghangli-uni for all the tests! So I got an unlucky combination. My run is here FYI:
/home/156/aek156/payu/MOM6-CICE6-1deg_jra55do_ryf.iss138

@aekiss
Copy link
Contributor Author

aekiss commented Jul 23, 2024

Perhaps dt_therm/dt_cpl needs to be even?
If you have a moment, could you try dt:dt_therm:dt_cpl = 1:10:2 to see if that also crashes with ocean_model_restart was called with unused buoyancy fluxes?

@aekiss
Copy link
Contributor Author

aekiss commented Jul 23, 2024

or dt:dt_therm:dt_cpl = 1:5:1, for that matter

@minghangli-uni
Copy link
Contributor

Perhaps dt_therm/dt_cpl needs to be even? 1:10:2
or dt:dt_therm:dt_cpl = 1:5:1, for that matter

Thank you for the suggestion, but they both worked and didnt crash. I am looking into this issue.

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Jul 23, 2024

When the ratio of the tracer timestep to the coupling timestep is 3:1, the model crashes with the error mentioned above.

The log output can trace the execution flow when the ratio is not 3:1. When the ratio is 3:1, the sequence of operations is disordered. There must be some synchronization mechanisms or other factors causing this, but it is quite unusual to me.

@minghangli-uni
Copy link
Contributor

I've submitted an issue to NCAR/MOM6 regarding this problem. NCAR/MOM6#290

@aekiss
Copy link
Contributor Author

aekiss commented Jul 24, 2024

Thanks @minghangli-uni !

@aekiss
Copy link
Contributor Author

aekiss commented Jul 24, 2024

Hi @minghangli-uni, did all your tests above have a cold start, or did some use a restart?

@minghangli-uni
Copy link
Contributor

They all started from a cold start. Following Gustavo’s suggestion to start from a restart file, the error disappears.

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Aug 21, 2024

In addition to the findings noted here, another test with a dt_therm=16200s and a dt_cpl=1800s (ratio is 9:1) results in the same error. I am revisiting this problem to explore reasons for the failure. Interestingly, the ratio works fine for 6:1, but not for 3:1 or 9:1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants