reduce time of test_TrainerOnePass #3296

luotao1 · 2017-08-07T05:31:50Z

partly fix #3259
test_TrainerOnePass的单测分成两种：

TEST(checkRemoteUpdater, XXX)：共有10个，每个消耗4s（在本地测）。调用checkRemoteParameterUpdaterTest函数，主要消耗在启动pserver的部分，对这一部分没有进行修改。下面是其中的一个具体时间。

64: [ RUN      ] checkRemoteUpdater.gpu2TrainerOldUpdater
64: I0807 12:06:49.846887  2287 test_TrainerOnePass.cpp:235]  useGpu=1 trainerCount=2 configFile=trainer/tests/sample_trainer_config.conf
64: I0807 12:06:49.846958  7112 LightNetwork.cpp:269] tcp server start 
64: [INFO 2017-08-07 12:06:49,858 networks.py:1491] The input order is [input, label]
64: [INFO 2017-08-07 12:06:49,858 networks.py:1497] The output order is [__cost_0__]
64: I0807 12:06:49.862642  2287 Trainer.cpp:165] trainer mode: Normal
64: I0807 12:06:49.862903  2287 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=2 numDevices=4
64: I0807 12:06:49.865258  2287 DataProvider.cpp:388] load data file trainer/tests/sample_data.txt
64: I0807 12:06:49.865303  2287 DataProvider.cpp:391] read done, num of instance=10 data size=30
64: I0807 12:06:49.865449  2287 DataProvider.cpp:388] load data file trainer/tests/sample_data.txt
64: I0807 12:06:49.865481  2287 DataProvider.cpp:391] read done, num of instance=10 data size=30
64: I0807 12:06:49.865561  2287 GradientMachine.cpp:85] Initing parameters..
64: I0807 12:06:49.867841  2287 GradientMachine.cpp:92] Init parameters done.
64: I0807 12:06:49.871497  2287 ParameterClient2.cpp:114] pserver 0 127.0.0.1:38110
64: I0807 12:06:49.871615  7121 LightNetwork.cpp:322] worker started, peer = 127.0.0.1
64: I0807 12:06:51.872315  7121 ParameterServer2.cpp:256] pserver: setParameter
64: I0807 12:06:51.872351  7121 ParameterServer2.cpp:302] pserver: new cpuvector: size=16384
64: I0807 12:06:51.872582  8344 ParameterClient2.cpp:114] pserver 0 127.0.0.1:38110
64: I0807 12:06:51.872742  8345 LightNetwork.cpp:322] worker started, peer = 127.0.0.1
64: I0807 12:06:53.879446  2287 test_TrainerOnePass.cpp:214] ___fc_layer_0__.w0  diff=0              
64: I0807 12:06:53.879509  2287 test_TrainerOnePass.cpp:214] ___fc_layer_1__.w0  diff=0              
64: I0807 12:06:53.879547  2287 test_TrainerOnePass.cpp:214] ___fc_layer_2__.w0  diff=0              
64: I0807 12:06:53.879582  2287 test_TrainerOnePass.cpp:214] sharew              diff=0              
64: I0807 12:06:53.879611  2287 test_TrainerOnePass.cpp:214] ___fc_layer_4__.w0  diff=0              
64: I0807 12:06:53.879647  2287 test_TrainerOnePass.cpp:214] ___fc_layer_5__.w0  diff=0              
64: I0807 12:06:53.879675  2287 test_TrainerOnePass.cpp:214] ___fc_layer_6__.w0  diff=0              
64: I0807 12:06:53.879709  2287 test_TrainerOnePass.cpp:214] ___fc_layer_7__.w0  diff=0              
64: I0807 12:06:53.879737  2287 test_TrainerOnePass.cpp:214] ___fc_layer_7__.wbiasdiff=0              
64: I0807 12:06:53.879771  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w0     diff=0              
64: I0807 12:06:53.879806  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w1     diff=0              
64: I0807 12:06:53.879842  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w2     diff=0              
64: I0807 12:06:53.879873  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w4     diff=0              
64: I0807 12:06:53.879906  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w5     diff=0              
64: I0807 12:06:53.879935  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w6     diff=0              
64: I0807 12:06:53.879969  2287 test_TrainerOnePass.cpp:214] ___mixed_0__.w7     diff=0              
64: I0807 12:06:53.880100  8344 SocketChannel.cpp:42] destory connection in socket channel, peer = 127.0.0.1
64: I0807 12:06:53.880110  8345 LightNetwork.cpp:339] worker begin to finish, peer = 127.0.0.1
64: I0807 12:06:53.880129  7121 ParameterServer2.cpp:564] pserver: getParameter
64: I0807 12:06:53.880147  8345 SocketChannel.cpp:42] destory connection in socket channel, peer = 127.0.0.1
64: I0807 12:06:53.880996  7121 LightNetwork.cpp:339] worker begin to finish, peer = 127.0.0.1
64: I0807 12:06:53.880998  2287 SocketChannel.cpp:42] destory connection in socket channel, peer = 127.0.0.1
64: I0807 12:06:53.881031  7121 SocketChannel.cpp:42] destory connection in socket channel, peer = 127.0.0.1
64: I0807 12:06:53.881778  7112 LightNetwork.cpp:215] pserver accept thread finish, addr= port=38110
64: I0807 12:06:53.881824  2287 SocketChannel.cpp:42] destory connection in socket channel, peer = 127.0.0.1
64: [       OK ] checkRemoteUpdater.gpu2TrainerOldUpdater (4035 ms)

其余调用trainerOnePassTest，通过设置稍小一点的data_size, num_pass, 时间从几十s下降到几s。

lcy-seso · 2017-08-07T05:42:04Z

paddle/trainer/tests/simple_sparse_neural_network.py

@@ -12,7 +12,7 @@

 embedding = embedding_layer(
    input=data_layer(
-        name="word_ids", size=65536),
+        name="word_ids", size=8192),


我有些好奇，我们的book的demo里面都出现了8192 这个数字，这个数字有什么特别的含义吗？

没有特别的含义。这里我直接除了8。

距离8192最近的素数是 8191

luotao1 · 2017-08-07T05:49:43Z

TeamCity上的时间：下降到28s。

[13:45:56]	67/133 Test #68: test_TrainerOnePass ....................... Passed 28.45 sec

wangkuiyi

赞提速

wangkuiyi · 2017-08-07T06:44:03Z

paddle/trainer/tests/simple_sparse_neural_network.py

@@ -1,6 +1,6 @@
 from paddle.trainer_config_helpers import *

-settings(batch_size=128, learning_method=AdaGradOptimizer(), learning_rate=1e-4)
+settings(batch_size=16, learning_method=AdaGradOptimizer(), learning_rate=1e-4)


我一般会选择素数作为参数，因为很多时候2的幂次不如素数那么容易导致错误。距离 16 最近的素数是 17.

wangkuiyi · 2017-08-07T06:45:07Z

paddle/trainer/tests/simple_sparse_neural_network.py

@@ -12,7 +12,7 @@

 embedding = embedding_layer(
    input=data_layer(
-        name="word_ids", size=65536),
+        name="word_ids", size=8192),


距离8192最近的素数是 8191

lcy-seso reviewed Aug 7, 2017

View reviewed changes

luotao1 requested a review from wangkuiyi August 7, 2017 05:50

wangkuiyi reviewed Aug 7, 2017

View reviewed changes

wangkuiyi approved these changes Aug 7, 2017

View reviewed changes

luotao1 closed this Aug 7, 2017

luotao1 reopened this Aug 7, 2017

reduce time of test_TrainerOnePass

16b70f3

luotao1 merged commit dda4217 into PaddlePaddle:develop Aug 7, 2017

luotao1 deleted the test_TrainerOnePass branch August 7, 2017 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce time of test_TrainerOnePass #3296

reduce time of test_TrainerOnePass #3296

luotao1 commented Aug 7, 2017 •

edited

Loading

lcy-seso Aug 7, 2017

luotao1 Aug 7, 2017

wangkuiyi Aug 7, 2017

luotao1 commented Aug 7, 2017

wangkuiyi left a comment

wangkuiyi Aug 7, 2017

wangkuiyi Aug 7, 2017

reduce time of test_TrainerOnePass #3296

reduce time of test_TrainerOnePass #3296

Conversation

luotao1 commented Aug 7, 2017 • edited Loading

lcy-seso Aug 7, 2017

Choose a reason for hiding this comment

luotao1 Aug 7, 2017

Choose a reason for hiding this comment

wangkuiyi Aug 7, 2017

Choose a reason for hiding this comment

luotao1 commented Aug 7, 2017

wangkuiyi left a comment

Choose a reason for hiding this comment

wangkuiyi Aug 7, 2017

Choose a reason for hiding this comment

wangkuiyi Aug 7, 2017

Choose a reason for hiding this comment

luotao1 commented Aug 7, 2017 •

edited

Loading