-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update registry cache to (ab)use Data
field of Descriptor
objects
#34
Conversation
See cue-labs/oci#29 for the upstream reference of my latest commit -- it fixes the ability to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to unit test the new functions?
|
||
if desc.Size > manifestSizeLimit { | ||
rc.data[digest] = desc | ||
return r, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if it is too big, we do not cache it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct -- that's in the previous implementation too, but handling it earlier turns out to be better because we can then return the original reader earlier instead of buffering it.
Also, "too big" here is currently 4MiB, so it's a pretty generous amount of things that will be cached in memory anyhow (especially since we're looking at mostly just indexes, image manifests, and config blobs).
} | ||
|
||
func (rc *registryCache) resolveBlob(ctx context.Context, repo string, digest ociregistry.Digest, f func(ctx context.Context, repo string, digest ociregistry.Digest) (ociregistry.Descriptor, error)) (ociregistry.Descriptor, error) { | ||
rc.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to lock on reads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it's unsafe for two goroutines to read a map otherwise (and because if this does end up querying the remote registry, it needs to write also). The "data is cached" implementation should be really fast, though, so there shouldn't be much lock contention here (the contention is going to come from actual "load-bearing" concurrent requests 😞).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing your answer, I think I have asked that before. Sorry for the noise.
It's not trivial to do so, but it is possible. What I've been looking at instead is trying to adjust |
Dropping to draft while I wrangle the testing situation (might need to get #32 first and rebase so the conflicts aren't too awful 😅). |
This allows us to store full descriptors (necessary to implement `Resolve*`), but with a net decrease in the number of fields we have to juggle / keep in sync. This does mean consumers need to be careful about how they use the `Descriptor` objects we return (esp. WRT `Data`), but it makes it easier for them to then have `Data` available if they want it (which is something I'd like to use in the future). This is a net win anyhow because the upstream objects might've contained `Data` fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀
Now that we have `Descriptor` objects (and a way to cache them), this implementation is trivial. 🎉
This also updates our `replace` to my new upstream fix that deals with `HEAD` on a Docker Hub digest failing when it shouldn't.
This allows us to update `cmd/lookup` to use this new wrapper and thus let us easily/correctly test more edge cases / lookups. 🚀
@@ -3,7 +3,7 @@
.../cmd/builds/main.go loadCacheFromFile 76.0%
.../cmd/builds/main.go saveCacheToFile 78.6%
.../cmd/builds/main.go main 80.7%
-.../cmd/lookup/main.go main 76.9%
+.../cmd/lookup/main.go main 82.6%
.../om/om.go Keys 100.0%
.../om/om.go Get 100.0%
.../om/om.go Set 100.0%
@@ -12,20 +12,25 @@
.../registry/cache.go RegistryCache 100.0%
.../registry/cache.go cacheKeyDigest 100.0%
.../registry/cache.go cacheKeyTag 100.0%
-.../registry/cache.go getBlob 75.0%
+.../registry/cache.go getBlob 81.0%
.../registry/cache.go GetBlob 100.0%
.../registry/cache.go GetManifest 100.0%
-.../registry/cache.go GetTag 69.6%
+.../registry/cache.go GetTag 82.6%
+.../registry/cache.go resolveBlob 68.8%
+.../registry/cache.go ResolveManifest 100.0%
+.../registry/cache.go ResolveBlob 100.0%
+.../registry/cache.go ResolveTag 77.8%
-.../registry/client.go Client 55.1%
+.../registry/client.go Client 54.2%
-.../registry/client.go EntryForRegistry 81.8%
+.../registry/client.go EntryForRegistry 72.7%
+.../registry/lookup.go Lookup 87.1%
-.../registry/rate-limits.go Do 50.0%
+.../registry/rate-limits.go RoundTrip 42.9%
.../registry/read-helpers.go readJSONHelper 60.9%
.../registry/ref.go ParseRef 83.3%
.../registry/ref.go Normalize 100.0%
.../registry/ref.go String 100.0%
.../registry/ref.go MarshalText 100.0%
.../registry/ref.go UnmarshalText 100.0%
-.../registry/synthesize-index.go SynthesizeIndex 77.6%
+.../registry/synthesize-index.go SynthesizeIndex 81.1%
.../registry/synthesize-index.go setRefAnnotation 100.0%
-.../registry/synthesize-index.go normalizeManifestPlatform 84.6%
+.../registry/synthesize-index.go normalizeManifestPlatform 83.9%
-total: (statements) 75.6%
+total: (statements) 76.9% (https://github.com/docker-library/meta-scripts/actions/runs/8398001647/job/23002236946#step:5:426 vs https://github.com/docker-library/meta-scripts/actions/runs/8426913314/job/23076302317?pr=34#step:5:426) |
(re the |
This allows us to store full descriptors (necessary to implement
Resolve*
), but with a net decrease in the number of fields we have to juggle / keep in sync.This does mean consumers need to be careful about how they use the
Descriptor
objects we return (esp. WRTData
), but it makes it easier for them to then haveData
available if they want it (which is something I'd like to use in the future). This is a net win anyhow because the upstream objects might've containedData
fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀Also: Implement
Resolve{Manifest,Tag,Blob}
in our registry cacheNow that we have
Descriptor
objects (and a way to cache them), this implementation is trivial. 🎉