Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update registry cache to (ab)use Data field of Descriptor objects #34

Merged
merged 5 commits into from
Mar 27, 2024

Conversation

tianon
Copy link
Member

@tianon tianon commented Mar 22, 2024

This allows us to store full descriptors (necessary to implement Resolve*), but with a net decrease in the number of fields we have to juggle / keep in sync.

This does mean consumers need to be careful about how they use the Descriptor objects we return (esp. WRT Data), but it makes it easier for them to then have Data available if they want it (which is something I'd like to use in the future). This is a net win anyhow because the upstream objects might've contained Data fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀

Also: Implement Resolve{Manifest,Tag,Blob} in our registry cache

Now that we have Descriptor objects (and a way to cache them), this implementation is trivial. 🎉

@tianon tianon requested a review from yosifkit as a code owner March 22, 2024 17:16
@tianon
Copy link
Member Author

tianon commented Mar 22, 2024

See cue-labs/oci#29 for the upstream reference of my latest commit -- it fixes the ability to HEAD against Docker Hub blob digests (which don't return a header of the digest).

Copy link
Collaborator

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to unit test the new functions?


if desc.Size > manifestSizeLimit {
rc.data[digest] = desc
return r, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if it is too big, we do not cache it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct -- that's in the previous implementation too, but handling it earlier turns out to be better because we can then return the original reader earlier instead of buffering it.

Also, "too big" here is currently 4MiB, so it's a pretty generous amount of things that will be cached in memory anyhow (especially since we're looking at mostly just indexes, image manifests, and config blobs).

}

func (rc *registryCache) resolveBlob(ctx context.Context, repo string, digest ociregistry.Digest, f func(ctx context.Context, repo string, digest ociregistry.Digest) (ociregistry.Descriptor, error)) (ociregistry.Descriptor, error) {
rc.mu.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to lock on reads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's unsafe for two goroutines to read a map otherwise (and because if this does end up querying the remote registry, it needs to write also). The "data is cached" implementation should be really fast, though, so there shouldn't be much lock contention here (the contention is going to come from actual "load-bearing" concurrent requests 😞).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing your answer, I think I have asked that before. Sorry for the noise.

@tianon
Copy link
Member Author

tianon commented Mar 22, 2024

Is it possible to unit test the new functions?

It's not trivial to do so, but it is possible. What I've been looking at instead is trying to adjust cmd/lookup so that we can invoke these other functions also and thus get more integration coverage (which isn't exactly the same, but still good and high quality coverage and much easier to write / write correctly).

@tianon tianon marked this pull request as draft March 22, 2024 19:52
@tianon
Copy link
Member Author

tianon commented Mar 22, 2024

Dropping to draft while I wrangle the testing situation (might need to get #32 first and rebase so the conflicts aren't too awful 😅).

tianon added 3 commits March 25, 2024 10:13
This allows us to store full descriptors (necessary to implement `Resolve*`), but with a net decrease in the number of fields we have to juggle / keep in sync.

This does mean consumers need to be careful about how they use the `Descriptor` objects we return (esp. WRT `Data`), but it makes it easier for them to then have `Data` available if they want it (which is something I'd like to use in the future).  This is a net win anyhow because the upstream objects might've contained `Data` fields so this forces us to deal with them in a sane way we're comfortable with instead of potentially just including them verbatim unintentionally. 🚀
Now that we have `Descriptor` objects (and a way to cache them), this implementation is trivial. 🎉
This also updates our `replace` to my new upstream fix that deals with `HEAD` on a Docker Hub digest failing when it shouldn't.
This allows us to update `cmd/lookup` to use this new wrapper and thus let us easily/correctly test more edge cases / lookups. 🚀
@tianon
Copy link
Member Author

tianon commented Mar 25, 2024

@@ -3,7 +3,7 @@
 .../cmd/builds/main.go            loadCacheFromFile          76.0%
 .../cmd/builds/main.go            saveCacheToFile            78.6%
 .../cmd/builds/main.go            main                       80.7%
-.../cmd/lookup/main.go            main                       76.9%
+.../cmd/lookup/main.go            main                       82.6%
 .../om/om.go                      Keys                       100.0%
 .../om/om.go                      Get                        100.0%
 .../om/om.go                      Set                        100.0%
@@ -12,20 +12,25 @@
 .../registry/cache.go             RegistryCache              100.0%
 .../registry/cache.go             cacheKeyDigest             100.0%
 .../registry/cache.go             cacheKeyTag                100.0%
-.../registry/cache.go             getBlob                    75.0%
+.../registry/cache.go             getBlob                    81.0%
 .../registry/cache.go             GetBlob                    100.0%
 .../registry/cache.go             GetManifest                100.0%
-.../registry/cache.go             GetTag                     69.6%
+.../registry/cache.go             GetTag                     82.6%
+.../registry/cache.go             resolveBlob                68.8%
+.../registry/cache.go             ResolveManifest            100.0%
+.../registry/cache.go             ResolveBlob                100.0%
+.../registry/cache.go             ResolveTag                 77.8%
-.../registry/client.go            Client                     55.1%
+.../registry/client.go            Client                     54.2%
-.../registry/client.go            EntryForRegistry           81.8%
+.../registry/client.go            EntryForRegistry           72.7%
+.../registry/lookup.go            Lookup                     87.1%
-.../registry/rate-limits.go       Do                         50.0%
+.../registry/rate-limits.go       RoundTrip                  42.9%
 .../registry/read-helpers.go      readJSONHelper             60.9%
 .../registry/ref.go               ParseRef                   83.3%
 .../registry/ref.go               Normalize                  100.0%
 .../registry/ref.go               String                     100.0%
 .../registry/ref.go               MarshalText                100.0%
 .../registry/ref.go               UnmarshalText              100.0%
-.../registry/synthesize-index.go  SynthesizeIndex            77.6%
+.../registry/synthesize-index.go  SynthesizeIndex            81.1%
 .../registry/synthesize-index.go  setRefAnnotation           100.0%
-.../registry/synthesize-index.go  normalizeManifestPlatform  84.6%
+.../registry/synthesize-index.go  normalizeManifestPlatform  83.9%
-total:                            (statements)               75.6%
+total:                            (statements)               76.9%

(https://github.com/docker-library/meta-scripts/actions/runs/8398001647/job/23002236946#step:5:426 vs https://github.com/docker-library/meta-scripts/actions/runs/8426913314/job/23076302317?pr=34#step:5:426)

@tianon tianon marked this pull request as ready for review March 25, 2024 21:09
@tianon
Copy link
Member Author

tianon commented Mar 25, 2024

(re the EntryForRegistry variance, see #36)

@yosifkit yosifkit merged commit 530a108 into docker-library:main Mar 27, 2024
1 check passed
@yosifkit yosifkit deleted the cache-data branch March 27, 2024 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants