Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update izumi and implement intel20 workaround #460

Merged
merged 1 commit into from
Jun 7, 2020

Conversation

apcraig
Copy link
Contributor

@apcraig apcraig commented Jun 6, 2020

PR checklist

  • Short (1 sentence) summary of your PR:
    Update izumi and implement intel20 workaround

  • Developer(s):
    apcraig

  • Suggest PR reviewers from list in the column to the right.

  • Please copy the PR test results link or provide a summary of testing completed below.

    Full test suite run on izumi

    • izumi had a recent upgrade
    • izumi only supports one version of each compiler right now
    • half the intel tests fail without the INTEL20_WORKAROUND fix
      - believe this is a compiler bug
      - recommend this be reviewed further and hopefully removed at a later time
    • pgi tests are not bit-for-bit and currently do not restart exactly
      - have verified results are reproducible run-to-run
      - have reviewed restart read/write diagnostic, they look fine
      - requires more debugging
    • https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#4df9a8f90aa7f1e992e2c83d61caf96aa8bfc4cd shows results both with and without the INTEL20_WORKAROUND fix
  • How much do the PR code changes differ from the unmodified code?

    • bit for bit except izumi pgi
    • different at roundoff level
    • more substantial
  • Does this PR create or have dependencies on Icepack or any other models?

    • Yes, Icepack is updated which includes bug fix for debug tests
    • No
  • Does this PR add any new test cases?

    • Yes
    • No
  • Is the documentation being updated? ("Documentation" includes information on the wiki or in the .rst files from doc/source/, which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/. A test build of the technical docs will be performed as part of the PR testing.)

    • Yes
    • No, does the documentation need to be updated at a later time?
      • Yes
      • No
  • Please provide any additional information or relevant details below:

  • Updated izumi due to upgrade, old compilers no longer available

  • Added feature to turn on machine/compiler dependent cpps via the machine files in the scripts.

  • Fixed nt_zbgc_frac initialization error (thought we fixed this before)

  • Updated icepack, includes a fix from Izumi updates Icepack#321 needed to pass some debug mode tests.

    Full test suite run on izumi

    • izumi had a recent upgrade
    • izumi only supports one version of each compiler right now and they are the latest compilers which we have little experience with
    • half the intel tests fail without the INTEL20_WORKAROUND fix
      - believe this is a compiler bug
      - recommend this be reviewed further and hopefully removed at a later time
    • pgi tests are not bit-for-bit and currently do not restart exactly
      - have verified results are reproducible run-to-run
      - have reviewed restart read/write diagnostic, they look fine
      - requires more debugging

Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this but it needs to be put into an issue for further work.

ice_transport_remap seems to have persistent seg fault issues, but they appear in different places; there's a comment about one of the omp directives seg faulting, and in the past, I've had to unroll a loop in the transport (I no longer remember which one) in order for optimization to not create a seg fault. Is there a particular (set of) variable(s) that need to be allocated, or is it really all of them? Is there a reason to not allocate here all the time? Would it help to move to a vector version of the transport (e.g. the new unstructured-grid code in MPAS)?

@eclare108213
Copy link
Contributor

@CICE-Consortium/devteam (or anyone reading this):
Do you have access to a machine with the intel20 compiler, to test this PR and see whether the problem is izumi-specific or if it's a real bug, either in the code or in the compiler?

@apcraig
Copy link
Contributor Author

apcraig commented Jun 7, 2020

I can't say for sure whether all the array need to be allocated to workaround this issue. I'm pretty sure that's not the case, but I did try to pin down whether it was one or a few arrays that were causing the problem and couldn't find any array alone that created the error. My sense is that it's a memory issue, not a problem with a particular array declaration or usage. I played around a lot with the argument declarations, local variable declarations, and source code. I even did things like turning off all the code in the subroutine while leaving all the declarations on, I degraded the interface down to no arguments, and many other things to try to isolate the issue. I also tried a number of compiler flags to try to increase the memory management. The error was surprising robust and came down to these local arrays.

What I'll do is merge this PR and start testing across a wider set of machines. I'll also try to find another intel20 compiler on another machine to see if the problem is seen on another machine. I think that's a very good idea.

Once we have a bit more information, we can decide what the next steps should be.

@apcraig apcraig merged commit 2b1727d into CICE-Consortium:master Jun 7, 2020
@apcraig apcraig deleted the izumiB branch August 17, 2022 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants