-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the training with Mistral 7B #10
Comments
Hi, Thanks for your interest in our work. Our experiments on Mistral 7B are carried out using 2x80GB A100. Could you identify which part of the training is the speed bottleneck? If collecting online trajectories is the bottleneck, one possibility is to check if it is possible to collect trajectories with multiple threads in parallel. In current implementation, I believe data collection is done through single thread in the main process alone. |
Hi there, Thanks for the suggestion on parallel data collection. By the way, I've noticed a potential issue with the timeout parameter in here. It seems the timeout isn't being set correctly, which could default to 10 minutes and cause errors on slower hardware. Here's the corrected line: accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=1800))]) This should set the NCCL timeout as intended. Hope this helps. |
Can I know how long it would take to complete the training with two A100 80Gs in the current situation? |
Thanks for your interest. The result in the paper is done through 2 days of training on 2xA100 80G. Would you like to provide what's the error message for AutoTokenizer? It seems to be working fine by the time when I reproduced the result 6 months ago. Also thanks for correcting the code for setting the thread timeouts! |
Hello,
I am currently training ArCHer with Mistral 7B on Twenty Questions using 32GB V100 GPUs, but it's taking longer than expected. Could you share any advice on parameter settings that might speed up the training, even at the expense of accuracy? Also, I am interested in the type of GPUs used by the authors.
The text was updated successfully, but these errors were encountered: