-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: clean up tmpdirs with dirEngine #17
Comments
My preferred approach to this is to follow Git and give content a grace period before cleanup. If a temp file or temp dir or unreferenced blob has not been touched in 24 hours (or whatever), it's probably a safe bet that nobody cares about it anymore. That means you don't reclaim space until 24 hours after the crash / abandoned push, but those events should be rare locally. |
Thanks for your insight. However, that's not relevant here because |
So is |
Not really comparable IMO. An OCI image is not like a git repo (aside from both being Merkle DAGs), and |
Merkle DAG seems like enough similarity for sharing similar CAS GC approaches to me. But I'm sure there are other approaches that will work too ;). |
TODO: Use |
Fixes: cyphar/umoci#17 Closes: cyphar/umoci#63 LGTMs: @cyphar
Following Git and defaulting to two weeks [1], as discussed in [2]. This allows the use of other non-flocking consumers (e.g. other CAS implementations) in the same base directory. The added robustness comes with a price, though, since we now have to balance cleanliness vs. robustness for those other consumers. * The hard-coded two-week cutoff may be insufficient for some use cases. Maybe somebody will need to download a huge layer through a very slow pipe, and they'll bump into this limit. Or maybe they have a high-cruft workflow or a small disk and would like to shorten this limit. If that happens or we become concerned that it might, we should make pruneExpire configurable. * Increasing pruneExpire protects more flock-less consumers from Clean, but also means that cruft can survive for longer before Clean is confident enough to remove it. Setting an infite pruneExpire would be very safe but would make Clean a no-op. Two weeks seems like a safe choice, since well-behaved consumers will clean up after themselves when they are closed. Clean is just handling poorly-behaved consumers (or well-behaved consumers that had a hard shutdown). I don't expect cruft to build up quickly enough for the two-week default to be an issue for most users. Consumers who do not wish to rely on pruneExpire (perhaps they have long-running operations or are locally setting pruneExpire very short) can continue to flock their temporary files and directories to protect them from Clean. Flocking a directory will also protect all content within that directory. Removal errors (because of insufficient permissions, etc.) seemed like they should be non-fatal so we could continue to remove other cruft. However, I didn't want to silently ignore the failed removal. In this commit, I log those errors with a "warning" level. [1]: https://git-scm.com/docs/git-gc [2]: https://github.com/openSUSE/umoci/issues/17#issuecomment-258686340 Signed-off-by: W. Trevor King <wking@tremily.us>
Following Git and defaulting to two weeks [1], as discussed in [2]. This allows the use of other non-flocking consumers (e.g. other CAS implementations) in the same base directory. The added robustness comes with a price, though, since we now have to balance cleanliness vs. robustness for those other consumers. * The hard-coded two-week cutoff may be insufficient for some use cases. Maybe somebody will need to download a huge layer through a very slow pipe, and they'll bump into this limit. Or maybe they have a high-cruft workflow or a small disk and would like to shorten this limit. If that happens or we become concerned that it might, we should make pruneExpire configurable. * Increasing pruneExpire protects more flock-less consumers from Clean, but also means that cruft can survive for longer before Clean is confident enough to remove it. Setting an infite pruneExpire would be very safe but would make Clean a no-op. Two weeks seems like a safe choice, since well-behaved consumers will clean up after themselves when they are closed. Clean is just handling poorly-behaved consumers (or well-behaved consumers that had a hard shutdown). I don't expect cruft to build up quickly enough for the two-week default to be an issue for most users. Consumers who do not wish to rely on pruneExpire (perhaps they have long-running operations or are locally setting pruneExpire very short) can continue to flock their temporary files and directories to protect them from Clean. Flocking a directory will also protect all content within that directory. Removal errors (because of insufficient permissions, etc.) seemed like they should be non-fatal so we could continue to remove other cruft. However, I didn't want to silently ignore the failed removal. In this commit, I log those errors with a "warning" level. The order of walk means that an old directory containing an old file will not be removed in a single pass. The first Clean will fail to remove the old directory because it is not empty, but will remove the old file (assuming it has sufficient permissions, etc.). A second Clean pass will remove the now-empty old directory. With cruft building slowly and Clean running fairly frequently, the delated old-directory removal didn't seem like a big enough issue to warrant an immediate re-clean attempt. [1]: https://git-scm.com/docs/git-gc [2]: https://github.com/openSUSE/umoci/issues/17#issuecomment-258686340 Signed-off-by: W. Trevor King <wking@tremily.us>
Following Git and defaulting to two weeks [1], as discussed in [2]. This allows the use of other non-flocking consumers (e.g. other CAS implementations) in the same base directory. The added robustness comes with a price, though, since we now have to balance cleanliness vs. robustness for those other consumers. * The hard-coded two-week cutoff may be insufficient for some use cases. Maybe somebody will need to download a huge layer through a very slow pipe, and they'll bump into this limit. Or maybe they have a high-cruft workflow or a small disk and would like to shorten this limit. If that happens or we become concerned that it might, we should make pruneExpire configurable. * Increasing pruneExpire protects more flock-less consumers from Clean, but also means that cruft can survive for longer before Clean is confident enough to remove it. Setting an infite pruneExpire would be very safe but would make Clean a no-op. Two weeks seems like a safe choice, since well-behaved consumers will clean up after themselves when they are closed. Clean is just handling poorly-behaved consumers (or well-behaved consumers that had a hard shutdown). I don't expect cruft to build up quickly enough for the two-week default to be an issue for most users. Consumers who do not wish to rely on pruneExpire (perhaps they have long-running operations or are locally setting pruneExpire very short) can continue to flock their temporary files and directories to protect them from Clean. Flocking a directory will also protect all content within that directory. Removal errors (because of insufficient permissions, etc.) seemed like they should be non-fatal so we could continue to remove other cruft. However, I didn't want to silently ignore the failed removal. In this commit, I log those errors with a "warning" level. The order of walk means that an old directory containing an old file will not be removed in a single pass. The first Clean will fail to remove the old directory because it is not empty, but will remove the old file (assuming it has sufficient permissions, etc.). A second Clean pass will remove the now-empty old directory. With cruft building slowly and Clean running fairly frequently, the delayed old-directory removal didn't seem like a big enough issue to warrant an immediate re-clean attempt. [1]: https://git-scm.com/docs/git-gc [2]: https://github.com/openSUSE/umoci/issues/17#issuecomment-258686340 Signed-off-by: W. Trevor King <wking@tremily.us>
This is going to be a bit of a layer violation, so I'll have to think about it. But basically we need to have a way to remove all of the temporary directories that have been left behind inside an OCI image.
The text was updated successfully, but these errors were encountered: