We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Your question Ask a clear and concise question about Megatron-LM.
Hi, I am doing a toy experiment in training the model.
I specified the TRAIN_SAMPLES=100 in my train.sh. And there's only 100 data points in my training dataset.
TRAIN_SAMPLES=100
train.sh
TRAIN_SAMPLES=100 # 300B tokens / 4096 LR_WARMUP_SAMPLES=0 LR_DECAY_SAMPLES=100 # TRAIN_SAMPLES - LR_WARMUP_SAMPLES options=" \ ... --train-samples ${TRAIN_SAMPLES} \ --lr-warmup-samples ${LR_WARMUP_SAMPLES} \ --lr-decay-samples ${LR_DECAY_SAMPLES} \ ... --split 99,1,0 \ torchrun --nproc_per_node 1 pretrain_model.py ${options}
But the log appears that it shows total number of epochs: 165 despite I set TRAIN_SAMPLES=100
total number of epochs: 165
Why will this happen when I am using --train-samples flag instead of --train-itr?
--train-samples
--train-itr
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Your question
Ask a clear and concise question about Megatron-LM.
Hi, I am doing a toy experiment in training the model.
I specified the
TRAIN_SAMPLES=100
in mytrain.sh
. And there's only 100 data points in my training dataset.But the log appears that it shows
total number of epochs: 165
despite I setTRAIN_SAMPLES=100
Why will this happen when I am using
--train-samples
flag instead of--train-itr
?The text was updated successfully, but these errors were encountered: