enable inference benchmark #5933

tensor-tang · 2017-11-27T06:33:06Z

luotao1 · 2017-11-29T13:02:15Z

benchmark/paddle/image/run_mkldnn.sh

+    --use_gpu=False \
+    --trainer_count=$thread \
+    --log_period=10 \
+    --config_args="batch_size=${bs},layer_num=${layer_num},is_test=True" \


is_test=True，这个参数在三个网络中都没有。
预测的网络和训练的网络不同，请相应调整。

这个变量主要是为了在reset 中batchnorm使用, inference 的时候use_global_stats需要为true。

https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/paddle/image/resnet.py#L8-L9

https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/paddle/image/resnet.py#L47-L48

预测的时候，是没有cost的，所以网络都要调整下。

luotao1 · 2017-11-29T13:04:14Z

benchmark/paddle/image/run_mkldnn.sh

+      --save_dir="models/${topology}-${layer_num}" \
+      --config_args="batch_size=128,layer_num=${layer_num}" \
+      > /dev/null 2>&1
+    echo "Done"


预测不要跟在训练的后面测，这样测预测性能的时候太慢了。

预测使用的网络不需要训练的非常好，因为只是测性能，拿任意一个batch保存的模型即可。

目前是如果发现本地没有训练好的模型，才会去train下以此来生产一个模型做inference。

这个模型也只是训练一个num_pass，因为是dummy data只有1024张图片，训练也不会很耗时，也只会训练一次，后面相同网络的inference用的都是同样的模型，所以整体不会太影响的。

luotao1

inference能用另外一个脚本来写么？

luotao1 · 2017-11-30T04:14:35Z

benchmark/paddle/image/run_mkldnn.sh

+    --use_gpu=False \
+    --trainer_count=$thread \
+    --log_period=10 \
+    --config_args="batch_size=${bs},layer_num=${layer_num},is_test=True" \


预测的时候，是没有cost的，所以网络都要调整下。

luotao1 · 2017-11-30T04:28:05Z

benchmark/paddle/image/run_mkldnn.sh

@@ -30,13 +30,74 @@ function train() {
    2>&1 | tee ${log} 
 }

-if [ ! -d "train.list" ]; then
+function test() {


test->inference

tensor-tang · 2017-11-30T05:20:38Z

好的，没问题

tensor-tang · 2017-11-30T06:27:18Z

Done

luotao1 · 2017-11-30T10:19:46Z

我将脚本拉下来运行，打印的日志如下：

Training model vgg_19
Done
I1130 09:54:24.775768   634 Util.cpp:166] commandline: /usr/local/bin/paddle_trainer --job=test --config=vgg.py --use_mkldnn=True --use_gpu=False --trainer_count=1 --log_period=32 --config_args=batch_size=1,layer_num=19,is_infer=True --init_model_path=models/vgg-19/pass-00000/ 
[INFO 2017-11-30 09:54:24,892 layers.py:2670] output for __conv_0__: c = 64, h = 224, w = 224, size = 3211264
[INFO 2017-11-30 09:54:24,893 layers.py:2670] output for __conv_1__: c = 64, h = 224, w = 224, size = 3211264
[INFO 2017-11-30 09:54:24,894 layers.py:2803] output for __pool_0__: c = 64, h = 112, w = 112, size = 802816
[INFO 2017-11-30 09:54:24,894 layers.py:2670] output for __conv_2__: c = 128, h = 112, w = 112, size = 1605632
[INFO 2017-11-30 09:54:24,895 layers.py:2670] output for __conv_3__: c = 128, h = 112, w = 112, size = 1605632
[INFO 2017-11-30 09:54:24,895 layers.py:2803] output for __pool_1__: c = 128, h = 56, w = 56, size = 401408
[INFO 2017-11-30 09:54:24,896 layers.py:2670] output for __conv_4__: c = 256, h = 56, w = 56, size = 802816
[INFO 2017-11-30 09:54:24,897 layers.py:2670] output for __conv_5__: c = 256, h = 56, w = 56, size = 802816
[INFO 2017-11-30 09:54:24,897 layers.py:2670] output for __conv_6__: c = 256, h = 56, w = 56, size = 802816
[INFO 2017-11-30 09:54:24,898 layers.py:2670] output for __conv_7__: c = 256, h = 56, w = 56, size = 802816
[INFO 2017-11-30 09:54:24,899 layers.py:2803] output for __pool_2__: c = 256, h = 28, w = 28, size = 200704
[INFO 2017-11-30 09:54:24,899 layers.py:2670] output for __conv_8__: c = 512, h = 28, w = 28, size = 401408
[INFO 2017-11-30 09:54:24,900 layers.py:2670] output for __conv_9__: c = 512, h = 28, w = 28, size = 401408
[INFO 2017-11-30 09:54:24,900 layers.py:2670] output for __conv_10__: c = 512, h = 28, w = 28, size = 401408
[INFO 2017-11-30 09:54:24,901 layers.py:2670] output for __conv_11__: c = 512, h = 28, w = 28, size = 401408
[INFO 2017-11-30 09:54:24,902 layers.py:2803] output for __pool_3__: c = 512, h = 14, w = 14, size = 100352
[INFO 2017-11-30 09:54:24,902 layers.py:2670] output for __conv_12__: c = 512, h = 14, w = 14, size = 100352
[INFO 2017-11-30 09:54:24,903 layers.py:2670] output for __conv_13__: c = 512, h = 14, w = 14, size = 100352
[INFO 2017-11-30 09:54:24,903 layers.py:2670] output for __conv_14__: c = 512, h = 14, w = 14, size = 100352
[INFO 2017-11-30 09:54:24,904 layers.py:2670] output for __conv_15__: c = 512, h = 14, w = 14, size = 100352
[INFO 2017-11-30 09:54:24,905 layers.py:2803] output for __pool_4__: c = 512, h = 7, w = 7, size = 25088
[INFO 2017-11-30 09:54:24,906 networks.py:1724] The input order is [image]
[INFO 2017-11-30 09:54:24,906 networks.py:1730] The output order is [__fc_layer_2__]
I1130 09:54:24.910554   634 Trainer.cpp:145] trainer: in testing mode
I1130 09:54:24.910567   634 Trainer.cpp:152] trainer mode: Testing
I1130 09:54:25.223145   634 PyDataProvider2.cpp:243] loading dataprovider provider::process
I1130 09:54:25.223551   634 GradientMachine.cpp:83] Loading parameters from models/vgg-19/pass-00000/
I1130 09:54:30.908305   634 Tester.cpp:143]  Batch=32 samples=32 AvgCost=1
I1130 09:54:31.867846   634 Tester.cpp:143]  Batch=64 samples=64 AvgCost=1
I1130 09:54:32.345593   634 Tester.cpp:143]  Batch=96 samples=96 AvgCost=1
I1130 09:54:32.801933   634 Tester.cpp:143]  Batch=128 samples=128 AvgCost=1
I1130 09:54:33.265005   634 Tester.cpp:143]  Batch=160 samples=160 AvgCost=1
I1130 09:54:33.743268   634 Tester.cpp:143]  Batch=192 samples=192 AvgCost=1
I1130 09:54:34.210225   634 Tester.cpp:143]  Batch=224 samples=224 AvgCost=1
I1130 09:54:34.666363   634 Tester.cpp:143]  Batch=256 samples=256 AvgCost=1
I1130 09:54:35.130930   634 Tester.cpp:143]  Batch=288 samples=288 AvgCost=1
I1130 09:54:35.564308   634 Tester.cpp:143]  Batch=320 samples=320 AvgCost=1
I1130 09:54:35.989562   634 Tester.cpp:143]  Batch=352 samples=352 AvgCost=1
I1130 09:54:36.428697   634 Tester.cpp:143]  Batch=384 samples=384 AvgCost=1
I1130 09:54:36.858831   634 Tester.cpp:143]  Batch=416 samples=416 AvgCost=1
I1130 09:54:37.290719   634 Tester.cpp:143]  Batch=448 samples=448 AvgCost=1
I1130 09:54:37.718937   634 Tester.cpp:143]  Batch=480 samples=480 AvgCost=1
I1130 09:54:38.143246   634 Tester.cpp:143]  Batch=512 samples=512 AvgCost=1
I1130 09:54:38.567258   634 Tester.cpp:143]  Batch=544 samples=544 AvgCost=1
I1130 09:54:38.992146   634 Tester.cpp:143]  Batch=576 samples=576 AvgCost=1
I1130 09:54:39.428045   634 Tester.cpp:143]  Batch=608 samples=608 AvgCost=1
I1130 09:54:39.852782   634 Tester.cpp:143]  Batch=640 samples=640 AvgCost=1
I1130 09:54:40.277169   634 Tester.cpp:143]  Batch=672 samples=672 AvgCost=1
I1130 09:54:40.701373   634 Tester.cpp:143]  Batch=704 samples=704 AvgCost=1
I1130 09:54:41.140897   634 Tester.cpp:143]  Batch=736 samples=736 AvgCost=1
I1130 09:54:41.642194   634 Tester.cpp:143]  Batch=768 samples=768 AvgCost=1
I1130 09:54:42.160156   634 Tester.cpp:143]  Batch=800 samples=800 AvgCost=1
I1130 09:54:42.675549   634 Tester.cpp:143]  Batch=832 samples=832 AvgCost=1
I1130 09:54:43.191017   634 Tester.cpp:143]  Batch=864 samples=864 AvgCost=1
I1130 09:54:43.706593   634 Tester.cpp:143]  Batch=896 samples=896 AvgCost=1
I1130 09:54:44.222618   634 Tester.cpp:143]  Batch=928 samples=928 AvgCost=1
I1130 09:54:44.743834   634 Tester.cpp:143]  Batch=960 samples=960 AvgCost=1
I1130 09:54:45.259887   634 Tester.cpp:143]  Batch=992 samples=992 AvgCost=1
I1130 09:54:45.780547   634 Tester.cpp:143]  Batch=1024 samples=1024 AvgCost=1
I1130 09:54:45.780632   634 Tester.cpp:245]  Pass=0 samples=1024 AvgCost=1 Eval:

缺少统计时间的地方，之前训练的时候有

I1122 10:19:50.843370   167 Stat.cpp:102] ======= StatSet: [GlobalStatInfo] status ======
I1122 10:19:50.843457   167 Stat.cpp:105] Stat=FwdBwd                         TID=167    total=48332.6    avg=483.325    max=636.896    min=456.854    count=100

--log_period=32可以调大一点，可以选100。

tensor-tang · 2017-11-30T12:21:11Z

是的，Training的时候有统计是因为code里面写死了。

这里如果需要我可以想办法在inference结束的时候计算下，不过需要去掉前几个算作burning time。

log的话可以调，不过需要根据batchsize的设置来了，我可以与上面的一起改了。

tensor-tang · 2017-11-30T16:33:30Z

Done.

最后结果会像下面一样，每个case只输出10个log：

I1201 00:26:19.730397 146749 GradientMachine.cpp:83] Loading parameters from models/googlenet-v1/pass-00000/
I1201 00:26:26.132701 146749 Tester.cpp:143] Batch=256 samples=256 AvgCost=1
I1201 00:26:27.385949 146749 Tester.cpp:143] Batch=512 samples=512 AvgCost=1
I1201 00:26:28.974287 146749 Tester.cpp:143] Batch=768 samples=768 AvgCost=1
I1201 00:26:30.627349 146749 Tester.cpp:143] Batch=1024 samples=1024 AvgCost=1
I1201 00:26:32.402112 146749 Tester.cpp:143] Batch=1280 samples=1280 AvgCost=1
I1201 00:26:33.676046 146749 Tester.cpp:143] Batch=1536 samples=1536 AvgCost=1
I1201 00:26:34.764509 146749 Tester.cpp:143] Batch=1792 samples=1792 AvgCost=1
I1201 00:26:35.846726 146749 Tester.cpp:143] Batch=2048 samples=2048 AvgCost=1
I1201 00:26:36.928786 146749 Tester.cpp:143] Batch=2304 samples=2304 AvgCost=1
I1201 00:26:38.007845 146749 Tester.cpp:143] Batch=2560 samples=2560 AvgCost=1
I1201 00:26:38.007884 146749 Tester.cpp:245] Pass=0 samples=2560 AvgCost=1 Eval:
Last 1280 samples start: 00:26:32.402112(1592.402112 sec), end: 00:26:38.007845(1598.007845 sec;
FPS: 228.33 images/sec

最后会出一个FPS的值。

luotao1

Look nice!

enable inference benchmark

6c5f928

tensor-tang requested a review from luotao1 November 27, 2017 06:33

luotao1 reviewed Nov 29, 2017

View reviewed changes

luotao1 reviewed Nov 30, 2017

View reviewed changes

tensor-tang added 3 commits November 30, 2017 14:17

separate mkldnn benchmark as train and infer

849bf9d

skip cost when inference

a5aac61

Merge remote-tracking branch 'upstream/develop' into inference

06d24e3

skip train list when inference, skip test list when training

aef6394

cal FPS of inference result

79b1709

luotao1 approved these changes Dec 1, 2017

View reviewed changes

luotao1 merged commit 000c1f7 into PaddlePaddle:develop Dec 1, 2017

tensor-tang deleted the inference branch December 1, 2017 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable inference benchmark #5933

enable inference benchmark #5933

tensor-tang commented Nov 27, 2017

luotao1 Nov 29, 2017 •

edited

Loading

tensor-tang Nov 29, 2017

luotao1 Nov 30, 2017

luotao1 Nov 29, 2017 •

edited

Loading

tensor-tang Nov 29, 2017

luotao1 left a comment

luotao1 Nov 30, 2017

luotao1 Nov 30, 2017

tensor-tang commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

luotao1 commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

luotao1 left a comment

enable inference benchmark #5933

enable inference benchmark #5933

Conversation

tensor-tang commented Nov 27, 2017

luotao1 Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

tensor-tang Nov 29, 2017

Choose a reason for hiding this comment

luotao1 Nov 30, 2017

Choose a reason for hiding this comment

luotao1 Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

tensor-tang Nov 29, 2017

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

luotao1 Nov 30, 2017

Choose a reason for hiding this comment

luotao1 Nov 30, 2017

Choose a reason for hiding this comment

tensor-tang commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

luotao1 commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

tensor-tang commented Nov 30, 2017

luotao1 left a comment

Choose a reason for hiding this comment

luotao1 Nov 29, 2017 •

edited

Loading

luotao1 Nov 29, 2017 •

edited

Loading