Update registry cache to (ab)use `Data` field of `Descriptor` objects #34

tianon · 2024-03-22T17:16:52Z

This allows us to store full descriptors (necessary to implement Resolve*), but with a net decrease in the number of fields we have to juggle / keep in sync.

This does mean consumers need to be careful about how they use the Descriptor objects we return (esp. WRT Data), but it makes it easier for them to then have Data available if they want it (which is something I'd like to use in the future). This is a net win anyhow because the upstream objects might've contained Data fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀

Also: Implement Resolve{Manifest,Tag,Blob} in our registry cache

Now that we have Descriptor objects (and a way to cache them), this implementation is trivial. 🎉

tianon · 2024-03-22T17:35:52Z

See cue-labs/oci#29 for the upstream reference of my latest commit -- it fixes the ability to HEAD against Docker Hub blob digests (which don't return a header of the digest).

whalelines

Is it possible to unit test the new functions?

whalelines · 2024-03-22T17:57:54Z

registry/cache.go

+
+	if desc.Size > manifestSizeLimit {
+		rc.data[digest] = desc
+		return r, nil


So if it is too big, we do not cache it?

Correct -- that's in the previous implementation too, but handling it earlier turns out to be better because we can then return the original reader earlier instead of buffering it.

Also, "too big" here is currently 4MiB, so it's a pretty generous amount of things that will be cached in memory anyhow (especially since we're looking at mostly just indexes, image manifests, and config blobs).

whalelines · 2024-03-22T18:03:15Z

registry/cache.go

+}
+
+func (rc *registryCache) resolveBlob(ctx context.Context, repo string, digest ociregistry.Digest, f func(ctx context.Context, repo string, digest ociregistry.Digest) (ociregistry.Descriptor, error)) (ociregistry.Descriptor, error) {
+	rc.mu.Lock()


Why do we have to lock on reads?

Unfortunately it's unsafe for two goroutines to read a map otherwise (and because if this does end up querying the remote registry, it needs to write also). The "data is cached" implementation should be really fast, though, so there shouldn't be much lock contention here (the contention is going to come from actual "load-bearing" concurrent requests 😞).

Seeing your answer, I think I have asked that before. Sorry for the noise.

tianon · 2024-03-22T19:43:20Z

Is it possible to unit test the new functions?

It's not trivial to do so, but it is possible. What I've been looking at instead is trying to adjust cmd/lookup so that we can invoke these other functions also and thus get more integration coverage (which isn't exactly the same, but still good and high quality coverage and much easier to write / write correctly).

tianon · 2024-03-22T19:53:45Z

Dropping to draft while I wrangle the testing situation (might need to get #32 first and rebase so the conflicts aren't too awful 😅).

This allows us to store full descriptors (necessary to implement `Resolve*`), but with a net decrease in the number of fields we have to juggle / keep in sync. This does mean consumers need to be careful about how they use the `Descriptor` objects we return (esp. WRT `Data`), but it makes it easier for them to then have `Data` available if they want it (which is something I'd like to use in the future). This is a net win anyhow because the upstream objects might've contained `Data` fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀

Now that we have `Descriptor` objects (and a way to cache them), this implementation is trivial. 🎉

This also updates our `replace` to my new upstream fix that deals with `HEAD` on a Docker Hub digest failing when it shouldn't.

This allows us to update `cmd/lookup` to use this new wrapper and thus let us easily/correctly test more edge cases / lookups. 🚀

tianon · 2024-03-25T21:09:01Z

@@ -3,7 +3,7 @@
 .../cmd/builds/main.go            loadCacheFromFile          76.0%
 .../cmd/builds/main.go            saveCacheToFile            78.6%
 .../cmd/builds/main.go            main                       80.7%
-.../cmd/lookup/main.go            main                       76.9%
+.../cmd/lookup/main.go            main                       82.6%
 .../om/om.go                      Keys                       100.0%
 .../om/om.go                      Get                        100.0%
 .../om/om.go                      Set                        100.0%
@@ -12,20 +12,25 @@
 .../registry/cache.go             RegistryCache              100.0%
 .../registry/cache.go             cacheKeyDigest             100.0%
 .../registry/cache.go             cacheKeyTag                100.0%
-.../registry/cache.go             getBlob                    75.0%
+.../registry/cache.go             getBlob                    81.0%
 .../registry/cache.go             GetBlob                    100.0%
 .../registry/cache.go             GetManifest                100.0%
-.../registry/cache.go             GetTag                     69.6%
+.../registry/cache.go             GetTag                     82.6%
+.../registry/cache.go             resolveBlob                68.8%
+.../registry/cache.go             ResolveManifest            100.0%
+.../registry/cache.go             ResolveBlob                100.0%
+.../registry/cache.go             ResolveTag                 77.8%
-.../registry/client.go            Client                     55.1%
+.../registry/client.go            Client                     54.2%
-.../registry/client.go            EntryForRegistry           81.8%
+.../registry/client.go            EntryForRegistry           72.7%
+.../registry/lookup.go            Lookup                     87.1%
-.../registry/rate-limits.go       Do                         50.0%
+.../registry/rate-limits.go       RoundTrip                  42.9%
 .../registry/read-helpers.go      readJSONHelper             60.9%
 .../registry/ref.go               ParseRef                   83.3%
 .../registry/ref.go               Normalize                  100.0%
 .../registry/ref.go               String                     100.0%
 .../registry/ref.go               MarshalText                100.0%
 .../registry/ref.go               UnmarshalText              100.0%
-.../registry/synthesize-index.go  SynthesizeIndex            77.6%
+.../registry/synthesize-index.go  SynthesizeIndex            81.1%
 .../registry/synthesize-index.go  setRefAnnotation           100.0%
-.../registry/synthesize-index.go  normalizeManifestPlatform  84.6%
+.../registry/synthesize-index.go  normalizeManifestPlatform  83.9%
-total:                            (statements)               75.6%
+total:                            (statements)               76.9%

(https://github.com/docker-library/meta-scripts/actions/runs/8398001647/job/23002236946#step:5:426 vs https://github.com/docker-library/meta-scripts/actions/runs/8426913314/job/23076302317?pr=34#step:5:426)

tianon · 2024-03-25T22:01:44Z

(re the EntryForRegistry variance, see #36)

tianon requested a review from yosifkit as a code owner March 22, 2024 17:16

whalelines reviewed Mar 22, 2024

View reviewed changes

tianon marked this pull request as draft March 22, 2024 19:52

tianon added 3 commits March 25, 2024 10:13

Implement Resolve{Manifest,Tag,Blob} in our registry cache

05d9b2e

Now that we have `Descriptor` objects (and a way to cache them), this implementation is trivial. 🎉

Update ociregistry and deal with the minor breaking changes

073cf83

This also updates our `replace` to my new upstream fix that deals with `HEAD` on a Docker Hub digest failing when it shouldn't.

tianon force-pushed the cache-data branch from 75e3b6a to 073cf83 Compare March 25, 2024 18:03

Add new registry.Lookup wrapper to handle generic Reference lookups

c2771e3

This allows us to update `cmd/lookup` to use this new wrapper and thus let us easily/correctly test more edge cases / lookups. 🚀

tianon force-pushed the cache-data branch from 87afcd2 to c2771e3 Compare March 25, 2024 21:01

tianon marked this pull request as ready for review March 25, 2024 21:09

Add "lookup" arguments to canonical JSON so it's easier to debug

46e854a

yosifkit approved these changes Mar 27, 2024

View reviewed changes

yosifkit merged commit 530a108 into docker-library:main Mar 27, 2024
1 check passed

yosifkit deleted the cache-data branch March 27, 2024 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update registry cache to (ab)use `Data` field of `Descriptor` objects #34

Update registry cache to (ab)use `Data` field of `Descriptor` objects #34

tianon commented Mar 22, 2024

tianon commented Mar 22, 2024

whalelines left a comment

whalelines Mar 22, 2024

tianon Mar 22, 2024

whalelines Mar 22, 2024

tianon Mar 22, 2024

whalelines Mar 25, 2024

tianon commented Mar 22, 2024

tianon commented Mar 22, 2024

tianon commented Mar 25, 2024

tianon commented Mar 25, 2024

Update registry cache to (ab)use Data field of Descriptor objects #34

Update registry cache to (ab)use Data field of Descriptor objects #34

Conversation

tianon commented Mar 22, 2024

tianon commented Mar 22, 2024

whalelines left a comment

Choose a reason for hiding this comment

whalelines Mar 22, 2024

Choose a reason for hiding this comment

tianon Mar 22, 2024

Choose a reason for hiding this comment

whalelines Mar 22, 2024

Choose a reason for hiding this comment

tianon Mar 22, 2024

Choose a reason for hiding this comment

whalelines Mar 25, 2024

Choose a reason for hiding this comment

tianon commented Mar 22, 2024

tianon commented Mar 22, 2024

tianon commented Mar 25, 2024

tianon commented Mar 25, 2024

Update registry cache to (ab)use `Data` field of `Descriptor` objects #34

Update registry cache to (ab)use `Data` field of `Descriptor` objects #34