Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_LayerGrad fails with a certain chance #8480

Closed
yu239-zz opened this issue Feb 22, 2018 · 4 comments
Closed

test_LayerGrad fails with a certain chance #8480

yu239-zz opened this issue Feb 22, 2018 · 4 comments
Assignees

Comments

@yu239-zz
Copy link

If I do CTest in Paddle, the test case 'test_LayerGrad' fails with a certain chance: different runs give inconsistent results. Sometimes it succeeds while sometimes it fails with a segfault error. Has anyone observed the same behavior?

45: I0221 18:51:27.907557 7077 LayerGradUtil.cpp:684] layer_type=pool useGpu=0
45: I0221 18:51:27.919493 7077 LayerGradUtil.cpp:724] cost 39190.1
45: I0221 18:51:27.938329 7077 LayerGradUtil.cpp:43] pool layer_0 step=3.93599e-06 cost1=39190.3 cost2=39190 true_delta=0.390625 analytic_delta=0.391901 diff=-0.0032571
45: I0221 18:51:27.938544 7077 LayerGradUtil.cpp:684] layer_type=pool useGpu=0
45: I0221 18:51:27.949440 7077 LayerGradUtil.cpp:724] cost 35550
45: I0221 18:51:27.967047 7077 LayerGradUtil.cpp:43] pool layer_0 step=4.38988e-06 cost1=35550.1 cost2=35549.8 true_delta=0.355469 analytic_delta=0.3555 diff=-8.67664e-05
45: I0221 18:51:27.967440 7077 LayerGradUtil.cpp:684] layer_type=pool useGpu=0
45: I0221 18:51:27.973249 7077 LayerGradUtil.cpp:724] cost 46682.2
45: I0221 18:51:27.990454 7077 LayerGradUtil.cpp:43] pool layer_0 step=1e-06 cost1=46682.5 cost2=46681.9 true_delta=0.597656 analytic_delta=0.596637 diff=0.00170841
45: I0221 18:51:27.990739 7077 LayerGradUtil.cpp:684] layer_type=pool useGpu=0
45: I0221 18:51:27.997279 7077 LayerGradUtil.cpp:724] cost 46688.5
45: *** Aborted at 1519267887 (unix time) try "date -d @1519267887" if you are using GNU date ***
45: PC: @ 0x0 (unknown)
45: *** SIGSEGV (@0x8) received by PID 7077 (TID 0x7fbc54fe38c0) from PID 8; stack trace: ***
45: @ 0x7fbc54be8340 (unknown)
45: @ 0x8af57a paddle::CpuMatrix::maxPoolBackward()
45: @ 0x6c2154 paddle::MaxPoolWithMaskLayer::backward()
45: @ 0x637e95 paddle::testLayerGradKernel()
45: @ 0x638ed0 paddle::testLayerGrad()
45: @ 0x619976 testPoolLayer()
45: @ 0x61a04c Layer_PoolLayer_Test::TestBody()
45: @ 0xb7d823 testing::internal::HandleExceptionsInMethodIfSupported<>()
45: @ 0xb70587 testing::Test::Run()
45: @ 0xb7062e testing::TestInfo::Run()
45: @ 0xb70e6d testing::TestCase::Run()
45: @ 0xb74c65 testing::internal::UnitTestImpl::RunAllTests()
45: @ 0xb74f10 testing::UnitTest::Run()
45: @ 0x5f06a4 main
45: @ 0x7fbc49f28ec5 __libc_start_main
45: @ 0x60cfd0 (unknown)
45: @ 0x0 (unknown)
1/1 Test #45: test_LayerGrad ...................***Exception: SegFault 9.94 sec

@helinwang
Copy link
Contributor

helinwang commented Feb 23, 2018

Able to reproduce now with for i in $(seq 1 20); do ctest -V -R test_LayerGrad; done

Could whoever on duty today follow up? Thanks! I think we need to set threshold so that the chance of a failing test is really low. Currently it's probably too high.

47: [ RUN      ] Layer.ClipLayer
47: I0222 16:50:27.369216 52278 LayerGradUtil.cpp:684]  layer_type=clip useGpu=0
47: I0222 16:50:27.370950 52278 LayerGradUtil.cpp:724]  cost 40769.9
47: I0222 16:50:27.373468 52278 LayerGradUtil.cpp:43] clip                 input               step=0.00111127     cost1=40770     cost2=40769.9   true_delta=0.0742188      analytic_delta=0.407699       diff=-0.817957 ***
47: /home/helin/repo/Paddle/paddle/gserver/tests/LayerGradUtil.cpp:773: Failure
47: Expected: (fabs(maxDiff)) <= (epsilon), actual: 0.817957 vs 0.02
47: I0222 16:50:27.373544 52278 LayerGradUtil.cpp:684]  layer_type=clip useGpu=1
47: I0222 16:50:27.375077 52278 LayerGradUtil.cpp:724]  cost 40769.9
47: I0222 16:50:27.376992 52278 LayerGradUtil.cpp:43] clip                 input               step=0.00105146     cost1=40770     cost2=40769.9   true_delta=0.078125       analytic_delta=0.407699       diff=-0.808376 ***
47: /home/helin/repo/Paddle/paddle/gserver/tests/LayerGradUtil.cpp:773: Failure
47: Expected: (fabs(maxDiff)) <= (epsilon), actual: 0.808376 vs 0.02
47: [  FAILED  ] Layer.ClipLayer (8 ms)

@pzelazko-intel
Copy link
Contributor

Happens to me too. I had to create dummy commits to re-run tests twice.
BTW Is there a way to re-run tests without pushing changes to branch?

@luotao1
Copy link
Contributor

luotao1 commented Mar 6, 2018

You can click the details
image
And click the Run again.
image

@luotao1
Copy link
Contributor

luotao1 commented May 28, 2018

close due test_LayerGrad is not in CI test now.

@luotao1 luotao1 closed this as completed May 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants