-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verification of dygraph MKLDNN accuracy convergence #25872
Comments
UpdateBecause some PR's are already merged, please see the updated info: Which PRs have to be merged to run Dygraph OneDNN trainingRequired
Instructions how to run Dygraph OneDNN trainingResNet
diff --git a/dygraph/resnet/train.py b/dygraph/resnet/train.py
index 6bf86f9..f53c5a2 100644
--- a/dygraph/resnet/train.py
+++ b/dygraph/resnet/train.py
@@ -239,9 +239,9 @@ class BottleneckBlock(fluid.dygraph.Layer):
else:
short = self.short(inputs)
y = fluid.layers.elementwise_add(x=short, y=conv2)
- layer_helper = LayerHelper(self.full_name(), act='relu')
+ layer_helper = LayerHelper(self.full_name(), act='relu', use_mkldnn=True)
return layer_helper.append_activation(y) |
mobilenet-v1.log |
Since you haven't replied for more than a year, we have closed this issue/pr. |
Quick Note: OneDNN was previously named DNNL and MKLDNN.
Instructions how to run Dygraph OneDNN training
When you merge the required pull requests, you can run training of few dygraph models with OneDNNN kernels. Some modifications to the models are still required. The models which training we are starting to support now are Mnist, ResNet, MobileNetV1, MobileNetV2.
You can prepend
DNNL_VERBOSE=1
to see which primitives are created in OneDNN library to verify which ops are using OneDNN primitives.All models info
To some training scripts you have to add a switch such as
--use_gpu
and disable it to use cpu because gpu is always used in i.e. ResNet. Then add it to the command.Mnist
FLAGS_use_mkldnn=true python train.py
Mobilenet
FLAGS_use_mkldnn=true python train.py --use_gpu=False --batch_size=64 --total_images=1281167 --class_dim=1000 --image_shape=3,224,224 --model_save_dir=output/ --lr_strategy=cosine_decay --lr=0.1 --num_epochs=240 --data_dir=/data/ILSVRC2012 --l2_decay=4e-5 --model=MobileNetV2
<- or V1You also have to add
use_mkldnn=True
to ops which are not imported from dygraph:ResNet
FLAGS_use_mkldnn=true python train.py
You also have to add
use_mkldnn=True
to ops which are not imported from dygraph:Which PRs have to be merged to run Dygraph OneDNN training
Required
Related PRs
Request for verifying training accuracy convergence
Could you please provide us with a verification of proper accuracy convergence? With limited resources it is hard for us to do. We don't have a procedure for that in place. For example mobilenet might take many days to train.
We have only been able to run full test with --ce FLAG of Mnist and Resnet (Flowers).
Mnist training
Resnet flowers training
Final note
Since we are just starting to support OneDNN training in PaddlePaddle, there may still be some bugs with training that may impact accuracy.
The text was updated successfully, but these errors were encountered: