Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

fix: remove saving of metadata for training ckpt #190

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

pinnstorm
Copy link
Member

This PR would remove the saving of the metadata from the training checkpoint (it is still present in the inference ckpt). I am not sure if there is a reason we need it in both checkpoints?

This PR came out of the AIFSv03 training where we had quite a bit of pain restarting runs with the zip bug for checkpoints over a certain size. We had to use the work around zip -d last.ckpt archive/anemoi-metadata/ai-models.json many times, which this PR would get around. Very happy to be told we need this metadata in the training checkpoints! But if not this would avoid quite a bit of pain when resuming runs. 🙏

@mchantry
Copy link
Member

mchantry commented Dec 6, 2024

Fixes #57

mchantry
mchantry previously approved these changes Dec 6, 2024
Copy link
Member

@mchantry mchantry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

@mchantry mchantry self-requested a review December 6, 2024 12:42
@mchantry
Copy link
Member

mchantry commented Dec 6, 2024

@pinnstorm can you put something in the changelog please

Copy link
Member

@mchantry mchantry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks Ewan!

@mchantry mchantry marked this pull request as ready for review December 6, 2024 14:07
@mchantry mchantry merged commit 2179a59 into develop Dec 6, 2024
119 checks passed
@mchantry mchantry deleted the feature/remove-ckpt-metadata branch December 6, 2024 14:10
@JPXKQX JPXKQX mentioned this pull request Dec 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants