Speed up the training with Mistral 7B #10

Bohemianc · 2024-07-18T09:22:29Z

Hello,

I am currently training ArCHer with Mistral 7B on Twenty Questions using 32GB V100 GPUs, but it's taking longer than expected. Could you share any advice on parameter settings that might speed up the training, even at the expense of accuracy? Also, I am interested in the type of GPUs used by the authors.

YifeiZhou02 · 2024-07-19T07:27:23Z

Hi,

Thanks for your interest in our work. Our experiments on Mistral 7B are carried out using 2x80GB A100.

Could you identify which part of the training is the speed bottleneck? If collecting online trajectories is the bottleneck, one possibility is to check if it is possible to collect trajectories with multiple threads in parallel. In current implementation, I believe data collection is done through single thread in the main process alone.

Bohemianc · 2024-07-22T06:15:13Z

Hi there,

Thanks for the suggestion on parallel data collection.

By the way, I've noticed a potential issue with the timeout parameter in here. It seems the timeout isn't being set correctly, which could default to 10 minutes and cause errors on slower hardware.

Here's the corrected line:

accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=1800))])

This should set the NCCL timeout as intended. Hope this helps.

ggbondcxl · 2024-07-25T07:18:25Z

Hi,

Thanks for your interest in our work. Our experiments on Mistral 7B are carried out using 2x80GB A100.

Could you identify which part of the training is the speed bottleneck? If collecting online trajectories is the bottleneck, one possibility is to check if it is possible to collect trajectories with multiple threads in parallel. In current implementation, I believe data collection is done through single thread in the main process alone.

Can I know how long it would take to complete the training with two A100 80Gs in the current situation?

ggbondcxl · 2024-07-25T07:35:45Z

As well as found that after I downloaded yesterday, it prompted a problem with the tokenizer, I wonder if it's due to a problem with the AutoTokenizer.

YifeiZhou02 · 2024-07-26T07:18:10Z

Thanks for your interest. The result in the paper is done through 2 days of training on 2xA100 80G. Would you like to provide what's the error message for AutoTokenizer? It seems to be working fine by the time when I reproduced the result 6 months ago.

Also thanks for correcting the code for setting the thread timeouts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the training with Mistral 7B #10

Speed up the training with Mistral 7B #10

Bohemianc commented Jul 18, 2024

YifeiZhou02 commented Jul 19, 2024

Bohemianc commented Jul 22, 2024

ggbondcxl commented Jul 25, 2024

ggbondcxl commented Jul 25, 2024

YifeiZhou02 commented Jul 26, 2024

Speed up the training with Mistral 7B #10

Speed up the training with Mistral 7B #10

Comments

Bohemianc commented Jul 18, 2024

YifeiZhou02 commented Jul 19, 2024

Bohemianc commented Jul 22, 2024

ggbondcxl commented Jul 25, 2024

ggbondcxl commented Jul 25, 2024

YifeiZhou02 commented Jul 26, 2024