-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary Cache: Add write-back support #1406
base: main
Are you sure you want to change the base?
Binary Cache: Add write-back support #1406
Conversation
8adc114
to
efdf99f
Compare
@microsoft-github-policy-service agree company="SingleStore" |
7929022
to
acb9b24
Compare
acb9b24
to
690198c
Compare
Added some fixes related to Nuget prefetches & write-only binary providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this feature is something that we should have.
I think we might need a separate setting to control it, e.g. 'read', 'write', 'writeback'. For example, if someone has a stack of providers which are in 'furthest away' order, it would be bad to try to write back to a cache on another continent when a cache hit happens on a local directory, so users probably want some control over that and silently doing it to them without an opt in may be a breaking change.
To that end I believe there is a longstanding structural problem we probably need to fix before we can go here. Specifically, the documentation suggests and the original design intent was that cache providers would be visited in order. Unfortunately, there is actually an assumed order that sorts whatever the user says in an internally defined provider order, which seems like a bug.
The existing design is excessively complex because backends are tried to be smashed together, which was done to try to save the work in making zip files. The change here kinda makes that worse. It looks like the first half of getting rid of this problem was already done in #998
Instead of stacking ProviderId
on top, now that there is no longer a reason to merge different users of the same provider type together, I would prefer to see a change that removes the multiple entries per provider entirely and just stores them in the user provided order. This will reduce complexity, actually match with what the documentation and design intent does, and make implementing the write-back behavior you're trying to achieve here much easier.
We probably should look at trying to land #908 first because autoantwort has been waiting forever ....
@@ -44,6 +44,19 @@ namespace vcpkg::Util | |||
|
|||
return false; | |||
} | |||
template<class Vec, class Filter> | |||
std::vector<ElementT<Vec>> filtered_copy(const Vec& container, const Filter&& filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<ElementT<Vec>> filtered_copy(const Vec& container, const Filter&& filter) | |
std::vector<ElementT<Vec>> copy_if(const Vec& container, Filter filter) |
- The standard algorithm this duplicates is
copy_if
so I think that should be in the name. - Filter functors are traditionally passed by value. (See for example
key_equal
above)
@@ -114,44 +134,62 @@ namespace vcpkg | |||
std::string instantiate_variables(const BinaryPackageReadInfo& info) const; | |||
}; | |||
|
|||
struct GithubActionsInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this should be checked in :)
Great feature, absolutely necessary to locally cache packages instead of downloading them on each configure. |
Write-back support for Binary Caching
This PR adds support for a cache write-back phase during package installation when the relevant cache entry was not newly built. This allows ports to be synced to all binary-cache providers marked as
write
orreadwrite
upon a successful installation.Currently the
vcpkg
binary-caching layer will only populate awrite
-capable if the port was built from scratch, i.e. not restored from aread
-capable cache provider. This means that a consumer has no way of guaranteeing other caches to be populated, unless they themselves build the port.Use-cases
As a developer I have a local cache, e.g. file-system & a remote cache, e.g. S3. The remote cache is meant to be a fallback in case the local cache isn't populated. Once restored from the remote cache, I'd like the local cache to be populated with the restored entry, so if the port needs to be restored at a later time, it can be done from the cheaper local cache i.s.o. the remote cache.
Implementation details
Provider identity
The binary caching layer generally works by adding a list of
read
&write
capable cache providers. Theread
providers are typically single-source, while thewrite
providers are multi-source, e.g. multiple local file-system directories belong to 1write
provider. In order to identify which providers need be written back to, we need to uniquely identify each provider that qualifies for a write-back. This is done using a uniqueProviderId
, assigned during the parsing of the binary-cache segment list.Write-back
The
push_success()
function is now also called post-install to populate thewrite
-capable providers that were (during prefetch) marked as missing the port. These individually mark the cache entry as written back usingmark_written_back()
.Relevant issues/discussions