Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll if process is still alive in Test.py #2290

Closed
wants to merge 2 commits into from

Conversation

MarkusTeufelberger
Copy link
Collaborator

In some cases this script did not detect properly that the subprocess is already terminated. The "append" loop afterwards caused memory to fill up and the script to fail eventually.

Example for this behaviour: https://travis-ci.org/MarkusTeufelberger/rippled-distrotest/jobs/310261175

In some cases this script did not detect properly that the subprocess is already terminated. The "append" loop afterwards caused memory to fill up and the script to fail eventually.
@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Dec 2, 2017

Jenkins Build Summary

Built from this commit

Built at 20171206 - 22:13:36

Test Results

Build Type Result Status
Author Check MarkusTeufelberger is not a collaborator! FAIL 🔴

@codecov-io
Copy link

codecov-io commented Dec 2, 2017

Codecov Report

Merging #2290 into develop will decrease coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2290      +/-   ##
===========================================
- Coverage    70.96%   70.89%   -0.07%     
===========================================
  Files          691      691              
  Lines        51663    51120     -543     
===========================================
- Hits         36661    36241     -420     
+ Misses       15002    14879     -123
Impacted Files Coverage Δ
src/ripple/beast/clock/chrono_util.h 82.6% <0%> (-8.7%) ⬇️
src/ripple/app/ledger/PendingSaves.h 93.54% <0%> (-6.46%) ⬇️
src/ripple/app/ledger/TransactionStateSF.cpp 50% <0%> (-3.85%) ⬇️
src/ripple/core/SociDB.h 60% <0%> (-3.64%) ⬇️
src/ripple/app/ledger/AcceptedLedger.cpp 66.66% <0%> (-2.09%) ⬇️
src/ripple/beast/utility/WrappedSink.h 54.54% <0%> (-1.98%) ⬇️
src/ripple/nodestore/impl/DatabaseRotatingImp.h 60.52% <0%> (-1.98%) ⬇️
src/ripple/conditions/impl/error.cpp 43.24% <0%> (-1.5%) ⬇️
src/ripple/server/impl/Port.cpp 68.9% <0%> (-1.41%) ⬇️
src/ripple/rpc/impl/Handler.cpp 97.61% <0%> (-1.37%) ⬇️
... and 202 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 090d813...3481cee. Read the comment docs.

@ximinez
Copy link
Collaborator

ximinez commented Dec 4, 2017

Hi @MarkusTeufelberger. Thanks for the submission.

Can you explain a bit more about what's happening when the subprocess termination is not detected? I can't really tell from that Travis link. Is there an infinite loop situation happening? Perhaps the Test.py --verbose option would be helpful.

Which "append" loop are you talking about? lines.append(decoded) on line 268/269?

Does this happen often? Are you seeing any data that shows this change actually fixes your issue?

@MarkusTeufelberger
Copy link
Collaborator Author

This is already running in verbose mode I'm afraid...

So far this only seems to happen mainly on ArchLinux - my suspicion is that breaking the iteration at '' is not enough to reliably detect that the child process has terminated.

Yes, I'm talking about the lines.append(decoded) part. I'll check again soonish if this definitely fixes the issue (my Linux partition recently started to act up so I'll have to do some incantations to get the GPU driver to run again), in general though I've seen already several builds stall/fail because Test.py seems to not produce any output any more which I suspect is due to insufficient termination checks in this loop.

See the travis builds in https://github.com/MarkusTeufelberger/rippled-distrotest for details, especially the archlinux ones.

@MarkusTeufelberger
Copy link
Collaborator Author

Looked at it again and realized I made a stupid mistake (https://docs.python.org/3.7/library/subprocess.html#subprocess.Popen.poll returns None if the process is still alive, not terminated), it is now fixed.

And yes, the build actually starts and works with this fix on Archlinux.

@MarkusTeufelberger
Copy link
Collaborator Author

Anything you still need? I can't really add or suggest reviewers unfortunately, since there's no public overview of who is responsible for what in the rippled team.

@nbougalis nbougalis mentioned this pull request Jan 10, 2018
@seelabs
Copy link
Collaborator

seelabs commented Jan 16, 2018

In 0.90.0-b3

@seelabs seelabs closed this Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants