-
Notifications
You must be signed in to change notification settings - Fork 424
[19.03 backport] Added garbage collector for image layers #268
[19.03 backport] Added garbage collector for image layers #268
Conversation
I would like to get this backported as we see those issues on production but other hand those have been there with Windows containers from day one (logic which I needed to change was introduced by moby#17924 long before Windows containers was even supported) so if there is no other users reporting about this issue then it probably is not that important to get back ported. It is also worth to note that as it took for a while me to get fix implemented we have also created workaround to it by adding some randomness to image build scripts so layers will not been re-used on newer versions (ugly way but at least it works). |
@thaJeztah @cpuguy83 so how it is, can we carry this as part of 19.03.2 ? |
c868302
to
3767a57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have know this codebase much - pointed out a couple of concerns.
func (fms *fileMetadataStore) getOrphan() ([]roLayer, error) { | ||
var orphanLayers []roLayer | ||
for _, algorithm := range supportedAlgorithms { | ||
fileInfos, err := ioutil.ReadDir(filepath.Join(fms.root, string(algorithm))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any scenarios where the paths here may be volume GUID prefixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Those layer metadata folders are on format:
/var/lib/docker/image/<storage driver>/layerdb/<hash algorithm>/<digest>
example: /var/lib/docker/image/overlay2/layerdb/sha256/ffc4c11463ee21b7532b63abd6079393c619a5d0f4b00397a4b9d1cf9efc4d9b
and when deleteLayer() function is called it will rename it to:
/var/lib/docker/image/overlay2/layerdb/sha256/ffc4c11463ee21b7532b63abd6079393c619a5d0f4b00397a4b9d1cf9efc4d9b-removing
which can be then found by getOrphan()
@@ -305,6 +305,47 @@ func (fms *fileMetadataStore) GetMountParent(mount string) (ChainID, error) { | |||
return ChainID(dgst), nil | |||
} | |||
|
|||
func (fms *fileMetadataStore) getOrphan() ([]roLayer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a unit-test around the logic in getOrphan in filestore_test.go please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Implemented it on moby#39715
3767a57
to
8b1458a
Compare
Waiting for #268 to be merged, so that I can cherry-pick that one |
8b1458a
to
5cc822d
Compare
cherry-picked moby#39715; removing "WIP" |
5c86c87
to
2fa6110
Compare
LGTM |
@kolyshkin @andrewhsu PTAL |
ping @ddebroy @kolyshkin @andrewhsu |
I had a couple of concerns that were addressed above. So LGTM from me (not a maintainer and don't have a lot of context around this code). |
Just to voice a concern emphatic I expressed in Slack only before... I am worried about bringing this change into a mature release. |
So, should we close this one as being too risky? (if so, remind me to revisit #384 and see if I can modify that one to work without this PR) |
Fair enough. Let's close this one here then and make sure that it works correctly on next main release. And FYI for anyone who might read this is that I created custom package with this one included https://github.com/olljanat/moby/releases/tag/19.03.5-olljanat1 and we will start using it on our environments. As side comment. Based on testings today I noticed on one environment that at least Windows Defender causes time to time issues on layer handling even when I have excludes for C:\ProgramData\docker folder and docked.exe/docker.exe processes (which is one of the reasons that garbage collector is needed after all). |
@thaJeztah any idea on a target date or version for this? I have windows nodes chewing up disk space that require regular maintenance :) |
Refactored exiting logic on way that layers are first marked to be under removal so if actual removal fails they can be found from disk and cleaned up. Full garbage collector will be implemented as part of containerd migration. Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com> (cherry picked from commit 213681b) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
1. Reduce complexity due to nested if blocks by using early return/continue 2. Improve logging Changes suggested as a part of code review comments in 39748 Signed-off-by: Vikram bir Singh <vikrambir.singh@docker.com> (cherry picked from commit ebf12db) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com> (cherry picked from commit d6cbeee) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com> (cherry picked from commit 8660330) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2fa6110
to
43abde3
Compare
moved to moby#40412 |
based on top of #267 - only last commit is new, but marked it WIPbackport of
Minor conflict because moby@072400f (moby#38377) is not in the 19.03 branch;
- What I did
Added garbage collector for image layers
Closes moby#38488 and most probably moby#39247 too.
and with moby#38401 also docker/for-win#745
- How I did it
Added logic which marks layers to be under removal by renaming layer digest folder to contain -<int>-removing suffix. That way we are able keep track that which layers are not anymore complete (=> which need to be re-downloaded if needed) and we are also able run clean up routine for those.
- How to verify it
Reproduce issue like it is described on moby#38488
also notice bug with missing content on comment moby#38488 (comment)
Run daemon on debug mode and stop it.
You will see log like this and orphan layer will disappear from disk.
and if you do same test on Windows this is what you will see:
- A picture of a cute animal (not mandatory but encouraged)