-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update file exists checkpointing error messages to be more helpful #2668
Conversation
The parameter name to the |
@dakinggg Added a better error message at checkpoint uploading and renamed the checkpoint saver param to match the top level trainer param. |
@irenedea is this PR still supposed to be worked on or will it be closed |
@mvpatel2000 making some updates right now. |
d18d638
to
821a968
Compare
except FileExistsError as e: | ||
raise FileExistsError( | ||
f'Uploading checkpoint failed with error: {e}. overwrite was set to {self.overwrite}. To overwrite checkpoints with Trainer, set save_overwrite to True.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@irenedea I think you are missing a raise ... from e
here. As is, this will discard the stack trace. Can you open a new PR adding in the trace?
What does this PR do?
The parameter in Trainer is called
save_overwrite
, so I'm assuming these error messages should be updated to reflect that. There are no other usages ofallow_overwrite
as parameters, variables, or error messages.I ran into this issue because I launched a run via foundry/mcli yaml, received one of these error messages, set add
allow_overwrite: true
, and my run still failed with a warning thatallow_overwrite
was unused 😄What issue(s) does this change relate to?
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)