-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于复现模型训练 #359
Comments
80个gpu小时 |
同问,单GPU在输入 python3 -m torch.distributed.launch --nproc_per_node=1 train.py --world_size=1时总会报错,我的设备是3070,ubuntu22.04,不知道有没有单GPU训练模型成功的前例 |
可能得尝试把所有 distributed 相关内容去掉 🤦 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我想用复现您的模型训练过程,但是您的训练代码是分布式训练的,我只有一台电脑,一个cpu,一个gpu,在使用您的代码训练时,发生了以下错误,请问如何用您的代码进行训练,顺便问下您当初训练了多久?
The text was updated successfully, but these errors were encountered: