Reference from https://github.com/pytorch/examples/tree/main/imagenet
easytrain -c configs/resnet50_8x_cfg.py
easytrain -c configs/mobilenet_v3_large_8x_cfg.py
Modify CFG.DIST_INIT_METHOD='tcp://{ip_of_node_0}:{free_port}'
in configs/resnet50_16x_cfg.py
.
e.g.
CFG.DIST_INIT_METHOD='tcp://192.168.1.2:55555'
- Node 0:
easytrain -c configs/resnet50_16x_cfg.py
- Node 1:
easytrain -c configs/resnet50_16x_cfg.py --node-rank 1
To train other models or modify hyperparameters, customize config yourself.
# last
python validate.py -c configs/resnet50_8x_cfg.py --devices 0
# best
python validate.py -c configs/resnet50_8x_cfg.py --devices 0 --ckpt /path/to/ckpt_dir/resnet50_best_val_acc@1.pt
python validate.py -c configs/mobilenet_v3_large_8x_cfg.py --devices 0