-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add train finished run event #2714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good! But I am curious if we want to log when the actual training finished or when everything is finished and about to shutdown. The reason is that as the code stands now, the train_finish_time
wouldn't get logged until the last checkpoint was sucessfully uploaded to the cloud, which can take an extra 30min-1hour after the last batch_end depending on the size of the checkpoint.
Perhaps running this log event on the last |
Yeah @eracah I think we want to log this before the wait for checkpoint upload. Is it simplest to make sure the mosaic logger runs before the RUD? |
Yup I think so! And any other RUDs that callbacks create (but I think those will always be added to the end of the list of callbacks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, @jjanezhang !
Maybe add a comment describing why RUD has to be last? |
Add train finished run event
TRAIN_FINISHED
event in MAPITesting