Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

P1 - Jenkins hangs due to disk full #2280

Closed
fanyangCS opened this issue Mar 7, 2019 · 12 comments
Closed

P1 - Jenkins hangs due to disk full #2280

fanyangCS opened this issue Mar 7, 2019 · 12 comments

Comments

@fanyangCS
Copy link
Contributor

What would you like to be added:

auto clean-up disk space

Why is this needed:
improve efficiency.

Without this feature, how does the current module work
do it manually

Components that may involve changes:
Jenkins

@hao1939
Copy link
Contributor

hao1939 commented Mar 7, 2019

We have setup the auto cleanup on Jenkins, for both Jenkins nodes and PAI nodes.
In any case we found disk full issue again, will need to check it manually, and figure out the root cause.

There are two cases, and we have handled both:

  1. Jenkins nodes disk full. We have set cleanup policy, will automatically clean data older than 5 days.
  2. Pai nodes(node used for installing PAI) disk full. We cleanup each time after the job finished, the disk shouldn't be full.

@fanyangCS
Copy link
Contributor Author

@hao1939 , thanks for the explanation. If you consider this case solved, please close this issue.

@mzmssg
Copy link
Member

mzmssg commented Mar 7, 2019

@hao1939
I think this case is stale image.

@hao1939
Copy link
Contributor

hao1939 commented Mar 7, 2019

@mzmssg , thank you!

Docker images were kept for speeding up.
We now have cleaner, ideally cleaner would clean up stale images if there is not enough space.

So if we fix the cleaner, everything would be fine.

@fanyangCS
Copy link
Contributor Author

@hao1939, cleaner does not have image clean-up feature yet.

@hao1939
Copy link
Contributor

hao1939 commented Mar 11, 2019

The speeding up is obviously, it's about 2x, so I suggest to keep the image cache.

From previously recording, it may takes 2-3 months before the disk full.
So what about put it on the DRI handbook, and take a manually check on CI/CD system when it happens again?

I like the idea of auto maintenance.
But if it's not frequently, manually check is acceptable, and we may figure out more problem via regularly diagnostic.

@mzmssg
Copy link
Member

mzmssg commented Mar 18, 2019

Could we move Jenkins docker storage to /mnt? It happened again when I try to update hadoop-run. Because hadoop-binary need 3G+ disk and can't be cached by docker.

@fanyangCS
Copy link
Contributor Author

@mzmssg, sure.

@hao1939
Copy link
Contributor

hao1939 commented Mar 19, 2019

Yeah, have mounted the large disk to /var/lib/docker.

@mzmssg
Copy link
Member

mzmssg commented Mar 28, 2019

@hao1939
Might you forget to move the jenkins docker repo to the mounted disk. Jenkins failed again due to can't push image.

@hao1939
Copy link
Contributor

hao1939 commented Apr 1, 2019

@mzmssg You are right. I will do that.

@abuccts
Copy link
Member

abuccts commented Dec 25, 2019

For jenkins disk usage:

  1. jenkins master stores jobs data (log, metadata etc.) and workspaces (cloned repo). Jobs data is under ${JENKINS_HOME}/jobs/branches and workspaces data is under /mnt.
    Need to clean jobs data manually, one time per half year is enough:
    $ cd ${JENKINS_HOME}/jobs/branches
    $ find ./* -maxdepth 0 -type d -ctime +100 | xargs sudo rm -rf
  2. pai nodes store Docker images and pai data, all of them will be cleaned each time running jenkins pipeline.
    echo "clean node ${host}:"
    ssh ${ACCOUNT_USR}@${host} -o StrictHostKeyChecking=no -i /home/${ACCOUNT_USR}/.ssh/id_rsa \
    'sudo rm -rf /datastorage || true; \
    sudo rm -rf /mnt/datastorage || true; \
    sudo service stop kubelet || true; \
    sudo docker stop $(sudo docker ps -q) || true; \
    sudo docker system prune -af || true'

    Build cache is only needed on one jenkins worker, which won't increase disk usage by time.
  3. Azure Linux auto-upgrade will leave many kernel/header packages, which use lots of disk.
    After cleaned by sudo apt autoremove, every node's os disk usage is under 50%.

@abuccts abuccts closed this as completed Dec 25, 2019
@scarlett2018 scarlett2018 changed the title Jenkins hangs due to disk full P1 - Jenkins hangs due to disk full Dec 30, 2019
@scarlett2018 scarlett2018 added this to the Pure K8S Beta Release milestone Dec 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants