Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ut error of test_recognize_digits, test=develop #27791

Merged
merged 1 commit into from
Oct 10, 2020

Conversation

qili93
Copy link
Contributor

@qili93 qili93 commented Oct 9, 2020

PR types

Others

PR changes

Others

Describe

问题描述:
test_recognize_digits单测在infer时会随机抛出模型参数文件的broken错误,错误log如下:

2020-10-05 17:42:19 InvalidArgumentError: The number of variables to be loaded is 0, expect it to be greater than 0.
2020-10-05 17:42:19   [Hint: Expected out_var_names.size() > 0UL, but received out_var_names.size():0 <= 0UL:0.] (at /paddle/paddle/fluid/operators/load_combine_op.h:42)
2020-10-05 17:42:19   [operator < load_combine > error]

错误原因分析:
由修复test_train_recognize_digits_mlp和test_train_recognize_digits_convd的PR相关https://github.com/PaddlePaddle/Paddle/pull/27475。
在PR27475中修复之后,paddle_build.sh脚本中运行单测的命令如下

export CTEST_PARALLEL_LEVEL=2
ctest -R test_train_recognize_digits_mlp &
ctest -R test_train_recognize_digits_conv &

这两个单测会被并行运行,由于2个都在PR27475中设置依赖test_recognize_digits,test_recognize_digits会同时并行运行2次,log如下:

2020-10-05 17:41:39     ============================================
2020-10-05 17:41:39     Generating TestCases Count ... 
2020-10-05 17:41:39     ============================================
2020-10-05 17:41:39 1 card TestCases count is 425
2020-10-05 17:41:39 Test project /paddle/build
2020-10-05 17:41:39 Test project /paddle/build
2020-10-05 17:41:39         Start 1251: test_recognize_digits
2020-10-05 17:41:39         Start 1251: test_recognize_digits
2020-10-05 17:41:39         Start    2: system_allocator_test
2020-10-05 17:41:39         Start    1: malloc_test

并行运行2个test_recognize_digits时,程序对同一个目录下的同一个param文件进行读写,所以会导致test_recognize_digits中的infer func在load param file时得到的param文件时损坏的。

解决办法:
将test_train_recognize_digits_mlp和test_train_recognize_digits_conv合并为一个单测case,顺序执行就不会有之前的问题。
也尝试过修改cmake文件中set_tests_properties的依赖关系,但是当paddle_build.sh脚本中并行后台运行2个单测时,即使对2个单测互相直接设置了依赖关系,也会导致test_recognize_digits被一起执行2遍,除非某个单测不设置对test_recognize_digits的依赖,这样又会导致test_train_recognize_digits_mlp或者test_train_recognize_digits_conv的单测失败。

@paddle-bot-old
Copy link

paddle-bot-old bot commented Oct 9, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hong19860320 hong19860320 merged commit b8d2a02 into PaddlePaddle:develop Oct 10, 2020
@qili93 qili93 deleted the fix_recog_digits_ut branch October 10, 2020 04:48
chen-zhiyu pushed a commit to chen-zhiyu/Paddle that referenced this pull request Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants