Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-osuosl-aix61-ppc64_be build failures #494

Closed
Trott opened this issue Sep 19, 2016 · 16 comments
Closed

test-osuosl-aix61-ppc64_be build failures #494

Trott opened this issue Sep 19, 2016 · 16 comments

Comments

@Trott
Copy link
Member

Trott commented Sep 19, 2016

test-osuosl-aix61-ppc64_be-2 seems to be reliably failing to build for the last 24 hours or so.

Example failure: https://ci.nodejs.org/job/node-test-commit-aix/936/nodes=aix61-ppc64/console

@jbergstroem said in IRC:

a lot of hanging test failures:
this: /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-child-process-fork-dgram.js child
..and this: /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/sequential/test-child-process-pass-fd.js child
seems to be a few 'wait' processes. I wonder if that syntax behaves differently on aix
been active for four days :( i'm on the phone for a couple of hours still -- could you perhaps file an issue and ping me/mhdawson?)
(we should ping gibfahn too -- he had a few tests that didn't exit either)

@mhdawson @gibfahn

@gibfahn
Copy link
Member

gibfahn commented Sep 20, 2016

I'll take a look, also cc/ @gireeshpunathil

@jbergstroem
Copy link
Member

Just a note: I killed all lingering processes on both machines since the test runner was stalling.

@mhdawson
Copy link
Member

I don't see any failures today since @jbergstroem comment and I also had to restart the jenkins agent. Since then its seems to be be running ok. We should probably check tomorrow to see if there are any jobs stacking up in the background.

@mhdawson
Copy link
Member

Looked today. There were a small number of processes hanging around:

 iojs  8388860        1   0 18:59:37      -  0:00 /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /h
ome/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-child-process-fork-dgram.js child
    iojs 10289232        1   0   Sep 20      -  0:00 /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /h
ome/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-child-process-fork-dgram.js child
    iojs 10748052        1   0 16:24:13      -  0:00 /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /h
ome/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-child-process-fork-dgram.js child
    iojs 11141252        1   0 04:16:07      -  0:00 /home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/out/Release/node /h
ome/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-child-process-fork-dgram.js child


Which are related to this issue:

https://github.com/nodejs/node/issues/8271

@gibfahn
Copy link
Member

gibfahn commented Sep 21, 2016

@mhdawson That's interesting, I hadn't realised that the child processes weren't being killed in that test failure. So if the test times out, I assume the test runner kills the parent but not any children?

@jbergstroem
Copy link
Member

@mhdawson if you check my above paste (well, Rich's) test/sequential/test-child-process-pass-fd.js seems affected too.

@mhdawson
Copy link
Member

mhdawson commented Sep 21, 2016

I've been looking at the fork-dgram failure, will just about to test proposed fix no AIX now. The other one must be much less frequent.

@mhdawson
Copy link
Member

PR to address fork-dgram failures nodejs/node#8697

@jbergstroem jbergstroem changed the title test-osuosl-aix61-ppc64_be-2 build failures test-osuosl-aix61-ppc64_be build failures Sep 23, 2016
@jbergstroem
Copy link
Member

@mhdawson, @gibfahn: log into -1 now and have a look. lots of stuff stalling. Here's a job that's been going for two hours as well (probably as a result of multiple test runners still being active): https://ci.nodejs.org/job/node-test-commit-aix/1040/nodes=aix61-ppc64/console

@gibfahn
Copy link
Member

gibfahn commented Sep 23, 2016

@jbergstroem I'm not sure whether it's because someone's cleaned up the machine (or because ps -ef isn't the right command), but I'm only seeing a couple of test-dgram processes on that machine.

image

Also when I first clicked your link I saw the still-running job (and I still have it open in a tab):

image
image

But now when I click through I don't:

image

image

image

@jbergstroem
Copy link
Member

@gibfahn you also have a few python processes:

 ps -ef | grep python
    iojs 4784192 2818506   0   Sep 21      -  0:04 /usr/bin/python tools/test.py -p tap --logfile test.tap --mode=release --flaky-tests=dontcare addons doctool inspector known_issues message parallel pseudo-tty sequential
    iojs 2490818 1901104   0 17:54:10      -  0:04 /usr/bin/python tools/test.py -p tap --logfile test.tap --mode=release --flaky-tests=dontcare addons doctool inspector known_issues message parallel pseudo-tty sequential
    iojs 3408274 3211782   0   Sep 20      -  0:04 /usr/bin/python tools/test.py -p tap --logfile test.tap --mode=release --flaky-tests=dontcare addons doctool inspector known_issues message parallel pseudo-tty sequential

..and a few gmake processes:

 ps -ef | grep make
    iojs 4128770       1   0 17:43:33      -  0:00 gmake run-ci -j 5
    iojs 2621908       1   0   Sep 20      -  0:00 gmake run-ci -j 5
    iojs 2818506 2818606   0   Sep 21      -  0:00 gmake test-ci
    iojs 1901104 4128770   0 17:53:15      -  0:00 gmake test-ci
    iojs 2818606       1   0   Sep 21      -  0:00 gmake run-ci -j 5
    iojs 3211782 2621908   0   Sep 20      -  0:00 gmake test-ci

This is probably not good seeing how each test.py makes the assumption that they will be run exclusively (tmpdir, whatnot).

@jbergstroem
Copy link
Member

You can also see the wait invoked as well as as few defunct processes.

@gibfahn
Copy link
Member

gibfahn commented Sep 23, 2016

@jbergstroem I'm not seeing those on test-osuosl-aix61-ppc64_be-1 (a.k.a. power8-nodejs2.osuosl.org). Are you looking at -2?

EDIT: I do see them on -2

@jbergstroem
Copy link
Member

@gibfahn if I am then there's a naming thing. I'm talking about 140.211.9.101.

@gibfahn
Copy link
Member

gibfahn commented Sep 23, 2016

I might have them the wrong way round in my .ssh/config, I have:

image

Anyway, I'll take a look at why the processes are being left behind.

@mhdawson
Copy link
Member

Ok, the test that was causing processes to still be running after the test run was fixed so I think we can close this. Please re-open if you feel differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants