Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive clone fails on Orion #826

Closed
CatherineThomas-NOAA opened this issue May 18, 2023 · 16 comments · Fixed by #827
Closed

Recursive clone fails on Orion #826

CatherineThomas-NOAA opened this issue May 18, 2023 · 16 comments · Fixed by #827
Assignees
Labels
bug Something isn't working

Comments

@CatherineThomas-NOAA
Copy link
Collaborator

I am unable to clone the UFS_UTILS module recursively on Orion. I used the command as specified in the wiki:
git clone --recurse-submodules git@github.com:ufs-community/UFS_UTILS.git

The error occurs here:

Cloning into 'physics/rte-rrtmgp'...
fatal: reference is not a tree: 7f01618c92409658bddd3afa9acb004c608f6a0d
Unable to checkout '7f01618c92409658bddd3afa9acb004c608f6a0d' in submodule path 'ccpp-physics/physics/rte-rrtmgp'
Failed to recurse into submodule path 'ccpp-physics'

I get the same error when running checkout.sh as part of the global-workflow. The log for the global-workflow checkout can be found here:
/work2/noaa/da/cthomas/git/global-workflow/develop.20230518.c/sorc/logs/checkout_ufs_utils.log

No errors occur when I clone on Hera.

@GeorgeGayno-NOAA
Copy link
Collaborator

@CatherineThomas-NOAA Thanks for letting me know. I did not realize that failed on Orion. I usually do a regular clone, then git submodule init and git submodule update. Also, the ufs_utils build script was updated to automatically check out any needed submodules. So, try a regular clone. Then, invoke build_all.sh.

I will update the wiki page accordingly.

@CatherineThomas-NOAA
Copy link
Collaborator Author

Thanks @GeorgeGayno-NOAA. I gave it a try and yes doing it that way worked for me. I was also able to build successfully.

I noticed that the ccpp-physics/physics/rte-rrtmgp/ directory remained empty though and was never cloned. Is that submodule necessary?

@GeorgeGayno-NOAA
Copy link
Collaborator

Thanks @GeorgeGayno-NOAA. I gave it a try and yes doing it that way worked for me. I was also able to build successfully.

I noticed that the ccpp-physics/physics/rte-rrtmgp/ directory remained empty though and was never cloned. Is that submodule necessary?

No. Only the ./ccpp-physics/physics main directory is needed.

@CatherineThomas-NOAA
Copy link
Collaborator Author

Great, good to know.

The problem now is that it still fails in the global workflow. I tried it step by step and this is the line in sorc/checkout.sh that fails:
git submodule update --init --recursive >> "${logfile}" 2>&1

It's specifically the "recursive" part since this works:
git submodule update --init >> "${logfile}" 2>&1

The checkout script is generic for all repos, so I'm not sure what's the best way to handle it. Even if the failure isn't actually a problem, many users (like me) will see an error on checkout and spend time on fixing something unnecessary. Any ideas? Should we loop in the workflow team at this point?

@GeorgeGayno-NOAA
Copy link
Collaborator

Great, good to know.

The problem now is that it still fails in the global workflow. I tried it step by step and this is the line in sorc/checkout.sh that fails: git submodule update --init --recursive >> "${logfile}" 2>&1

It's specifically the "recursive" part since this works: git submodule update --init >> "${logfile}" 2>&1

The checkout script is generic for all repos, so I'm not sure what's the best way to handle it. Even if the failure isn't actually a problem, many users (like me) will see an error on checkout and spend time on fixing something unnecessary. Any ideas? Should we loop in the workflow team at this point?

Tagging in @KateFriedman-NOAA. Recall, I made a related fix to the build script - #819.

@KateFriedman-NOAA
Copy link
Collaborator

Thanks for tagging me @GeorgeGayno-NOAA ...taking a look...

@KateFriedman-NOAA
Copy link
Collaborator

I am unable to reproduce the checkout error on Orion.

I cloned global-workflow develop and ran ./checkout.sh -g. No errors with the UFS_UTILS checkout:

Orion-login-2[9] /work/noaa/global/kfriedma/git/develop/sorc$ ./checkout.sh -g
Received -g flag for optional checkout of GSI-based DA
...

Performing checkout of ufs_utils
|-- Cloning from https://github.com/ufs-community/UFS_UTILS.git into ufs_utils.fd
|-- Checking out 72a0471
|-- Updating submodules (if any)

...

Checkout log:

Cloning into 'ufs_utils.fd'...
Note: switching to '72a0471'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work/noaa/global/kfriedma/git/develop/sorc/ufs_utils.fd/ccpp-physics'...
Submodule path 'ccpp-physics': checked out '12c115e992d3a265eaaa67d72fcbdb3a6f21195f'
Submodule 'physics/rte-rrtmgp' (https://github.com/earth-system-radiation/rte-rrtmgp) registered for path 'ccpp-physics/physics/rte-rrtmgp'
Cloning into '/work/noaa/global/kfriedma/git/develop/sorc/ufs_utils.fd/ccpp-physics/physics/rte-rrtmgp'...
From https://github.com/earth-system-radiation/rte-rrtmgp
 * branch            7f01618c92409658bddd3afa9acb004c608f6a0d -> FETCH_HEAD
Submodule path 'ccpp-physics/physics/rte-rrtmgp': checked out '7f01618c92409658bddd3afa9acb004c608f6a0d'

I also cloned global-workflow develop and ran ./checkout.sh -g in my /work2 space on Orion. Also no errors.
See log: /work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/logs_0/checkout_ufs_utils.log

Additionally, we've started trialing automated CI tests in global-workflow in the past couple weeks and I don't recall them flagging checkout errors for UFS_UTILS on Orion (same hash of UFS_UTILS).

@CatherineThomas-NOAA for your case, to see if a modified checkout option would help, I edited a copy of checkout.sh to do a submodule update for UFS_UTILS without the --recursive flag but retain the recursive checkouts for the other components. Each component has a checkout command within checkout.sh to set the various checkout flags, including recursive=YES/NO.

Edited copy: /work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/checkout.sh

My edited copy doesn't get the physics/rte-rrtmgp submodule:

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics'...

Original checkout:

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics'...
Submodule path 'ccpp-physics': checked out '12c115e992d3a265eaaa67d72fcbdb3a6f21195f'
Submodule 'physics/rte-rrtmgp' (https://github.com/earth-system-radiation/rte-rrtmgp) registered for path 'ccpp-physics/physics/rte-rrtmgp'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics/physics/rte-rrtmgp'...
From https://github.com/earth-system-radiation/rte-rrtmgp
 * branch            7f01618c92409658bddd3afa9acb004c608f6a0d -> FETCH_HEAD
Submodule path 'ccpp-physics/physics/rte-rrtmgp': checked out '7f01618c92409658bddd3afa9acb004c608f6a0d'

Not sure if we want to disable the recursive flag but I'm interested if this works for you. Since I can't reproduce the checkout error and it's only happening on Orion I don't want to suggest modifying the UFS_UTILS checkout in g-w right now. My next thought is something different with your account environment or a transient issue on Orion during checkout? Does the checkout error keep happening with fresh clones of g-w develop?

@GeorgeGayno-NOAA did you say we don't need the physics/rte-rrtmgp submodule? If so, perhaps removing the --recursive flag for UFS_UTILS won't break things? Still don't want to remove it yet though, just wondering.

@GeorgeGayno-NOAA
Copy link
Collaborator

I am unable to reproduce the checkout error on Orion.

I cloned global-workflow develop and ran ./checkout.sh -g. No errors with the UFS_UTILS checkout:

Orion-login-2[9] /work/noaa/global/kfriedma/git/develop/sorc$ ./checkout.sh -g
Received -g flag for optional checkout of GSI-based DA
...

Performing checkout of ufs_utils
|-- Cloning from https://github.com/ufs-community/UFS_UTILS.git into ufs_utils.fd
|-- Checking out 72a0471
|-- Updating submodules (if any)

...

Checkout log:

Cloning into 'ufs_utils.fd'...
Note: switching to '72a0471'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work/noaa/global/kfriedma/git/develop/sorc/ufs_utils.fd/ccpp-physics'...
Submodule path 'ccpp-physics': checked out '12c115e992d3a265eaaa67d72fcbdb3a6f21195f'
Submodule 'physics/rte-rrtmgp' (https://github.com/earth-system-radiation/rte-rrtmgp) registered for path 'ccpp-physics/physics/rte-rrtmgp'
Cloning into '/work/noaa/global/kfriedma/git/develop/sorc/ufs_utils.fd/ccpp-physics/physics/rte-rrtmgp'...
From https://github.com/earth-system-radiation/rte-rrtmgp
 * branch            7f01618c92409658bddd3afa9acb004c608f6a0d -> FETCH_HEAD
Submodule path 'ccpp-physics/physics/rte-rrtmgp': checked out '7f01618c92409658bddd3afa9acb004c608f6a0d'

I also cloned global-workflow develop and ran ./checkout.sh -g in my /work2 space on Orion. Also no errors. See log: /work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/logs_0/checkout_ufs_utils.log

Additionally, we've started trialing automated CI tests in global-workflow in the past couple weeks and I don't recall them flagging checkout errors for UFS_UTILS on Orion (same hash of UFS_UTILS).

@CatherineThomas-NOAA for your case, to see if a modified checkout option would help, I edited a copy of checkout.sh to do a submodule update for UFS_UTILS without the --recursive flag but retain the recursive checkouts for the other components. Each component has a checkout command within checkout.sh to set the various checkout flags, including recursive=YES/NO.

Edited copy: /work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/checkout.sh

My edited copy doesn't get the physics/rte-rrtmgp submodule:

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics'...

Original checkout:

HEAD is now at 72a04716 Update GDAS INIT utility for GFS COM reorganization (#820)
Submodule 'ccpp-physics' (https://github.com/NCAR/ccpp-physics.git) registered for path 'ccpp-physics'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics'...
Submodule path 'ccpp-physics': checked out '12c115e992d3a265eaaa67d72fcbdb3a6f21195f'
Submodule 'physics/rte-rrtmgp' (https://github.com/earth-system-radiation/rte-rrtmgp) registered for path 'ccpp-physics/physics/rte-rrtmgp'
Cloning into '/work2/noaa/global/kfriedma/git/develop_ufs-utils_recursive/sorc/ufs_utils.fd/ccpp-physics/physics/rte-rrtmgp'...
From https://github.com/earth-system-radiation/rte-rrtmgp
 * branch            7f01618c92409658bddd3afa9acb004c608f6a0d -> FETCH_HEAD
Submodule path 'ccpp-physics/physics/rte-rrtmgp': checked out '7f01618c92409658bddd3afa9acb004c608f6a0d'

Not sure if we want to disable the recursive flag but I'm interested if this works for you. Since I can't reproduce the checkout error and it's only happening on Orion I don't want to suggest modifying the UFS_UTILS checkout in g-w right now. My next thought is something different with your account environment or a transient issue on Orion during checkout? Does the checkout error keep happening with fresh clones of g-w develop?

@GeorgeGayno-NOAA did you say we don't need the physics/rte-rrtmgp submodule? If so, perhaps removing the --recursive flag for UFS_UTILS won't break things? Still don't want to remove it yet though, just wondering.

Correct. That submodule is not needed. If you don't do a recursive clone, you only need to do git submodule init and git submodule update. Those steps are done for you if you use my new build script.

@BinLiu-NOAA
Copy link
Collaborator

BinLiu-NOAA commented May 19, 2023

@GeorgeGayno-NOAA, I believe we got similar issues for HAFS GitHub repository on Orion. The issue we found is that the default git version on Orion is pretty old. And if you module load git/2.21.0 on Orion, then the git clone --recursive will work fine.

@GeorgeGayno-NOAA
Copy link
Collaborator

Thanks @BinLiu-NOAA. I get the same result. When I tried the latest version of git on Orion (2.28.0), the command git clone --recurse-submodules worked fine.

@CatherineThomas-NOAA
Copy link
Collaborator Author

Thanks @BinLiu-NOAA! Yes, this also worked for me. Everything checked out fine, including rte-rrtmgp, once I explicitly loaded git/2.21.0.

Does the checkout error keep happening with fresh clones of g-w develop?

@KateFriedman-NOAA Yes, it happened very reliably, both in the global workflow and then in the ufs_utils repo as a standalone. But the module load seems to solve it for me. What do you think is the best course from here? Add a note in the RTD about loading? Load the module directly from checkout.sh?

@KateFriedman-NOAA
Copy link
Collaborator

Ah, this isn't the first time that the stock/older git version on Orion has bitten us. We recommend users use git/2.28.0 on Orion, see this section of our RTD: https://global-workflow.readthedocs.io/en/latest/hpc.html#version

I'll talk to the others about maybe loading the git module from checkout.sh but we generally avoid doing that.

@CatherineThomas-NOAA
Copy link
Collaborator Author

Thanks for pointing out that part of the documentation @KateFriedman-NOAA. Sounds to me like this issue can be closed now.

@GeorgeGayno-NOAA
Copy link
Collaborator

@DusanJovic-NOAA just noted the same problem. His comments below:

I think this happens because the ccpp-physics that UFS_UTILS is pointing to is an old commit, which uses non existing rte-rrtmgp branch.

Current UFS_UTILS points to ccpp-physics @ 12c115e, which in turn points to rte-rrtmgp @ 7f01618, which does not belong to any branch

In the ufs-weather-model (i.e. in fv3atm) we point to ccpp ufs/dev branch in ccpp-physics fork under ufs-community, not to NCAR's repository. Maybe you can also point to the same fork/branch for consistency and to avoid this kind of clone error when people are not using the latest version of git.

@GeorgeGayno-NOAA
Copy link
Collaborator

Will reopen this issue and try @DusanJovic-NOAA suggestion.

@GeorgeGayno-NOAA GeorgeGayno-NOAA self-assigned this May 22, 2023
@GeorgeGayno-NOAA GeorgeGayno-NOAA added the bug Something isn't working label May 22, 2023
@GeorgeGayno-NOAA
Copy link
Collaborator

Created a new branch at 0ddbac3

GeorgeGayno-NOAA added a commit to GeorgeGayno-NOAA/UFS_UTILS that referenced this issue May 23, 2023
GeorgeGayno-NOAA added a commit that referenced this issue May 24, 2023
Point to the ufs-community fork instead of the NCAR fork, which 
will eliminate problems when cloning with old versions of git.

Fixes #826.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants