Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many open files flake #10246

Closed
bparees opened this issue Aug 5, 2016 · 5 comments
Closed

too many open files flake #10246

bparees opened this issue Aug 5, 2016 · 5 comments
Assignees
Labels

Comments

@bparees
Copy link
Contributor

bparees commented Aug 5, 2016

FATAL: command execution failed
java.io.IOException: Cannot run program "/bin/sh" (in directory "/var/lib/jenkins/jobs/test_pull_requests_origin_conformance/workspace@6"): error=24, Too many open files

as seen in:
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/4751/consoleFull

@stevekuznetsov
Copy link
Contributor

I can't make too much sense of this... from the Jenkins master:

# cat /proc/sys/fs/file-max 
792046
# lsof | wc -l
851476

/cc @danmcp

@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented Aug 5, 2016

Ok, a little more nuanced approach, but nothing interesting. It doesn't look like we are leaving file descriptors open, so perhaps this was just a transient issue and we need to bump the limits a little for Java.

#!/bin/bash
processess="$(ps -A | grep -E "java|bash|vagrant|ruby|sh|ssh" | awk '{print $1}')"
for process in ${processess}; do
    if [[ -f "/proc/${process}/limits" ]]; then
        max="$(grep -Po '(?<=Max open files            )[0-9]+' "/proc/${process}/limits")"
        current="$( lsof -p "${process}" | wc -l )"
        echo "current ${current} "$'\t'" max ${max} [$( ps --no-headers --pid "${process}" -o pid,cmd )]"
    fi
done

Output:

# /tmp/debug.sh
current 957  max 4096 [ 1994 /etc/alternatives/java -Dcom.sun.akuma.Daemon=daemonized -Djava.awt.headless=true -Djsse.enableSNIExtension=false -Dhudson.model.ParametersAction.safeParameters=ghprbActualCommit,ghprbActualCommitAuthor,ghprbActualCommitAuthorEmail,ghprbAuthorRepoGitUrl,ghprbCommentBody,ghprbCredentialsId,ghprbGhRepository,ghprbPullAuthorEmail,ghprbPullAuthorLogin,ghprbPullAuthorLoginMention,ghprbPullDescription,ghprbPullId,ghprbPullLink,ghprbPullLongDescription,ghprbPullTitle,ghprbSourceBranch,ghprbTargetBranch,ghprbTriggerAuthor,ghprbTriggerAuthorEmail,ghprbTriggerAuthorLogin,ghprbTriggerAuthorLoginMention,GIT_BRANCH,sha1 -DJENKINS_HOME=/var/lib/jenkins -jar /usr/lib/jenkins/jenkins.war --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war --daemon --httpPort=8081 --ajp13Port=-1 --debug=5 --handlerCountMax=100 --handlerCountMaxIdle=20 --prefix=/jenkins --httpListenAddress=127.0.0.1]
current 14   max 4096 [ 5003 bash /bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 7    max 4096 [ 5004 /opt/vagrant/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 64   max 4096 [ 5006 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 14   max 4096 [ 5044 bash /bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 7    max 4096 [ 5045 /opt/vagrant/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 64   max 4096 [ 5047 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 14   max 4096 [ 6114 bash /bin/vagrant test-origin --extended core([images]) -d --skip-check --skip-image-cleanup]
current 7    max 4096 [ 6115 /opt/vagrant/bin/vagrant test-origin --extended core([images]) -d --skip-check --skip-image-cleanup]
current 64   max 4096 [ 6117 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended core([images]) -d --skip-check --skip-image-cleanup]
current 14   max 4096 [ 6428 bash /bin/vagrant test-origin --extended core(builds),core(secrets) --env JUNIT_REPORT=true -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 7    max 4096 [ 6429 /opt/vagrant/bin/vagrant test-origin --extended core(builds),core(secrets) --env JUNIT_REPORT=true -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 64   max 4096 [ 6431 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended core(builds),core(secrets) --env JUNIT_REPORT=true -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 14   max 4096 [ 7524 bash /bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 7    max 4096 [ 7525 /opt/vagrant/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 64   max 4096 [ 7527 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 14   max 4096 [ 7943 bash /bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 7    max 4096 [ 7944 /opt/vagrant/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 64   max 4096 [ 7946 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 14   max 4096 [ 8381 bash /bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 7    max 4096 [ 8383 /opt/vagrant/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 64   max 4096 [ 8385 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 14   max 4096 [ 8523 bash /bin/vagrant test-origin --target release]
current 7    max 4096 [ 8524 /opt/vagrant/bin/vagrant test-origin --target release]
current 64   max 4096 [ 8526 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target release]
current 14   max 4096 [ 8778 bash /bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 7    max 4096 [ 8779 /opt/vagrant/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 64   max 4096 [ 8781 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 14   max 4096 [10718 /bin/sh -xe /tmp/hudson190828893731349234.sh]
current 14   max 4096 [12364 /bin/sh -xe /tmp/hudson4502976453902355702.sh]
current 14   max 4096 [12851 /bin/sh -xe /tmp/hudson4135632047824112674.sh]
current 14   max 4096 [12852 /bin/sh -xe /tmp/hudson2811013819816939087.sh]
current 14   max 4096 [12856 /bin/sh -xe /tmp/hudson8888692932170347600.sh]
current 14   max 4096 [12861 /bin/sh -xe /tmp/hudson8890081767042245139.sh]
current 14   max 4096 [14247 /bin/sh -xe /tmp/hudson1539359463505099134.sh]
current 14   max 4096 [14258 /bin/sh -xe /tmp/hudson6692397132444455527.sh]
current 14   max 4096 [14289 /bin/sh -xe /tmp/hudson1406323992844666310.sh]
current 14   max 4096 [15926 bash /bin/vagrant test-origin -c -d]
current 7    max 4096 [15927 /opt/vagrant/bin/vagrant test-origin -c -d]
current 64   max 4096 [15929 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin -c -d]
current 15   max 1024 [16318 /bin/bash /tmp/debug.sh]
current 80   max 1024 [17267 sshd: root@pts/2]
current 15   max 1024 [17310 -bash]
current 15   max 1024 [17373 bash]
current 14   max 4096 [22648 /bin/sh -xe /tmp/hudson2155735789687959282.sh]
current 14   max 4096 [23315 bash /bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 7    max 4096 [23316 /opt/vagrant/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 64   max 4096 [23318 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 14   max 4096 [23329 bash /bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 7    max 4096 [23330 /opt/vagrant/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 64   max 4096 [23332 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 14   max 4096 [25709 /bin/sh -xe /tmp/hudson4844321846281789884.sh]
current 14   max 4096 [25712 /bin/sh -xe /tmp/hudson9099502135414728487.sh]
current 14   max 4096 [25715 /bin/sh -xe /tmp/hudson3688482008931058188.sh]
current 14   max 4096 [25716 /bin/sh -xe /tmp/hudson1500895833431289197.sh]
current 14   max 4096 [26214 bash /bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 7    max 4096 [26215 /opt/vagrant/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 64   max 4096 [26217 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target test -o check -d --skip-image-cleanup]
current 80   max 1024 [26226 sshd: root@pts/1]
current 15   max 1024 [26235 -bash]
current 14   max 4096 [26657 bash /bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 7    max 4096 [26658 /opt/vagrant/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 64   max 4096 [26660 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended conformance -d --skip-check --skip-image-cleanup --image-registry ci.dev.openshift.redhat.com:5000]
current 14   max 4096 [29330 /bin/sh -xe /tmp/hudson4683754588862680523.sh]
current 14   max 4096 [29331 /bin/sh -xe /tmp/hudson4617263838446080347.sh]
current 14   max 4096 [29332 /bin/sh -xe /tmp/hudson6374687681030142228.sh]
current 14   max 4096 [29334 /bin/sh -xe /tmp/hudson5196875596081457979.sh]
current 14   max 4096 [30852 bash /bin/vagrant sync-origin -s]
current 14   max 4096 [30854 bash /bin/vagrant sync-origin -s]
current 7    max 4096 [30856 /opt/vagrant/bin/vagrant sync-origin -s]
current 64   max 4096 [30860 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant sync-origin -s]
current 7    max 4096 [30862 /opt/vagrant/bin/vagrant sync-origin -s]
current 64   max 4096 [30867 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant sync-origin -s]
current 14   max 4096 [32688 bash /bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 7    max 4096 [32689 /opt/vagrant/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 64   max 4096 [32691 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --extended networking -d --skip-check --skip-image-cleanup]
current 14   max 4096 [32694 bash /bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 7    max 4096 [32695 /opt/vagrant/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]
current 64   max 4096 [32697 ruby /opt/vagrant/embedded/gems/gems/vagrant-1.7.4/bin/vagrant test-origin --target check -j --env JUNIT_REPORT=true --artifacts]

@ncdc
Copy link
Contributor

ncdc commented Aug 6, 2016

lsof shows file handles for all threads for each process, and when you create a thread, it typically inherits all of the main thread's handles. This is why the numbers look off.

also see https://bugzilla.redhat.com/show_bug.cgi?id=1362519 which might be related and we haven't resolved yet

cc @timothysc

@stevekuznetsov-bot
Copy link

To be clear this is on the Jenkins master, nothing to do with OpenShift.

On Aug 6, 2016 3:49 PM, "Andy Goldstein" notifications@github.com wrote:

lsof shows file handles for all threads for each process, and when you
create a thread, it typically inherits all of the main thread's handles.
This is why the numbers look off.

also see https://bugzilla.redhat.com/show_bug.cgi?id=1362519 which might
be related and we haven't resolved yet

cc @timothysc https://github.com/timothysc


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#10246 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AQzVWNz0G8Jrybv7ic6WUyO5vTKYVWBTks5qdOUrgaJpZM4Jd5XH
.

@stevekuznetsov
Copy link
Contributor

@stevekuznetsov-bot I don't remember giving you the right to speak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants