From 6dfa60ec2170bdf74cd9545d28b2dabdeb367079 Mon Sep 17 00:00:00 2001 From: Hariharan Devarajan Date: Mon, 8 Jan 2024 11:44:32 -0800 Subject: [PATCH] documentation for the checkpointing. --- docs/source/config.rst | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/source/config.rst b/docs/source/config.rst index 9d245ca2..db7a5f98 100644 --- a/docs/source/config.rst +++ b/docs/source/config.rst @@ -300,7 +300,20 @@ checkpoint - performing one checkpointing per certain number of steps specified * - model_size - 10240 - - the size of the model in bytes + - the size of the model parameters in bytes + * - optimization_groups + - [] + - List of optimization group tensors. Use Array notation for yaml. + * - num_layers + - 1 + - Number of layers to checkpoint. Each layer would be checkpointed separately. + * - layer_parameters + - [] + - List of parameters per layer. This is used to perform I/O per layer. + * - type + - rank_zero + - Which rank performs this checkpoint. All ranks (all_ranks) or Rank 0 (rank_zero). + .. note::