Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to clear persistent cache dirs on self-hosted runners #20755

Merged
merged 2 commits into from
Apr 4, 2024

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Apr 4, 2024

We have a few long-lived runners that have cache directories that grow and grow until the machine runs out of space. This workflow is a stop-gap measure that makes it easier for anyone to give the machine more space, by making workflow to do it, rather than requiring someone with credentials to SSH in and do it manually.

For instance, on March 18, the macOS 11 runner had:

directory size (GB)
~/Library/Caches/nce 113
~/.cache 34
~/.rustup 12
~/.pex 8.8
~/.nce 1.2

This PR set-ups automatic deletion of these directories with some logging.

@huonw huonw added the category:internal CI, fixes for not-yet-released features, etc. label Apr 4, 2024
Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we still trigger this job manually?

@huonw
Copy link
Contributor Author

huonw commented Apr 4, 2024

Yeah, but I could imagine it being made a bit smarter (e.g. only clear if there's not much space left, to avoid an unnecessary cache flush) and then run on a cron job. I'm not personally planning on doing that in the near future though.

@huonw huonw marked this pull request as ready for review April 4, 2024 21:40
@huonw huonw merged commit 9897614 into main Apr 4, 2024
24 checks passed
@huonw huonw deleted the huonw/clear-caches-workflow branch April 4, 2024 21:41
huonw added a commit that referenced this pull request Apr 5, 2024
… resilient (#20756)

In #20755, I had a typo: formatting the jobs as a list, with `name`s
keys... but that's the syntax for steps. Jobs need to be a dict with the
ID as the key.

This also squashes any errors from `rm -rf ....`: on Mac, it seems that
attempts to delete directories that the runner doesn't have permission
for (particularly system ones in `~/Library/Caches`). But that's fine,
we'll still delete all of the caches created by "day-to-day" actions
runs, which are the big ones.

(Why wasn't this caught in #20755? I attempted to run the workflow
before merging, and got errors, but incorrectly assumed this was because
the workflow didn't exist on `main`, but they were actually because it's
invalid.)

This has now successfully run and cleared hundreds of gigs over two
runs:

- https://github.com/pantsbuild/pants/actions/runs/8562624080 (required
the "squash `rm` errors fix)
-
https://github.com/pantsbuild/pants/actions/runs/8562947394/job/23467155450
(with that fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants