Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote cache: support arbitrary layers as results #2501

Merged
merged 1 commit into from
Jan 16, 2022

Conversation

sipsma
Copy link
Collaborator

@sipsma sipsma commented Dec 1, 2021

This change enables inline cache to work as expected with MergeOp by
supporting a new type of result, DiffResult, which enables results to be
specified as a specific ordered set of layers inside an image.
Previously, results could only be specified with a singe layer index,
which meant that they had to start at the image's base layer and end at
that index. That meant that merge inputs couldn't be specified as they
are often a subset of the image layers that don't begin at the base.

Signed-off-by: Erik Sipsma erik@sipsma.dev

Comment on lines 67 to 68
// TODO: what the behavior on previous builders when there's no Result. Is it
// assumed that this record is Scratch? Or is it a cache miss? Want the latter.
Copy link
Collaborator Author

@sipsma sipsma Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi This is my main remaining question, the answer wasn't immediately obvious when jumping through the code so would appreciate any insight you can provide.

This case is important because now when a record can't be specified using the old style CacheResult and instead has to use the new DiffResults, an old buildkitd will see that the record exists but has empty CacheResult. The behavior I want is for these buildkitds to then just get a cache miss in this case.

If it turns out instead that they assume "no result" means "result is scratch" then we'll have to modify this slightly. We could add a new type to replace CacheRecord that exclusively uses DiffResults (so old builders won't even see records that use diff results). Or another option would be to have CacheRecord set its LayerIndex to -1 in these cases, which would mean that old buildkitds get an error when trying to import this cache. That's not ideal obviously but is better than them getting a cache hit with the incorrect result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you never call AddResult then it will not load any cache. Loading scratch should be possible with AddResult(nil) (maybe). In cache interface https://github.com/moby/buildkit/blob/master/docs/solver.md#cache-providers query and records() are separate concepts. So a vertex can match cache but there is no loadable record behind that match. This is because we can use the cache match to check for a cache match for the next vertex and that might be loadable now.

@sipsma sipsma mentioned this pull request Jan 6, 2022
10 tasks
Digest digest.Digest `json:"digest,omitempty"`
Inputs [][]CacheInput `json:"inputs,omitempty"`
Results []CacheResult `json:"layers,omitempty"`
DiffResults []DiffResult `json:"diffs,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One naming solution might be that if the old ones are called "layers" and only have one index then new ones are "chains" as they have a chain of indexes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name to ChainedResult

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in JSON I'd prefer just json:"chains,omitempty" . The results themselves are not chained. It's if a result points to layer index or a chain of layer indexes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to chains

@@ -76,11 +77,50 @@ func (ce *exporter) ExportForLayers(ctx context.Context, layers []digest.Digest)
cache := map[int]int{}

// reorder layers based on the order in the image
blobIndexes := map[digest.Digest]int{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: initialize with length

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
if match {
n := getSortedLayerIndex(rr.LayerIndex, cfg.Layers, cache)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just len(resultBlobs) now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that minus one, good call, fixed.

}
var filteredResults []v1.CacheResult
for _, rr := range r.Results {
empty := v1.CacheResult{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var doesn't seem needed here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing if rr != v1.CacheResult{} { is a syntax error unfortunately, i.e.: https://go.dev/play/p/kCPmFl4rpU0

So I think it's needed unless I'm missing something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rr != (v1.CacheResult{}) should work

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, fixed

r.Results[j] = v1.CacheResult{}
r.DiffResults = append(r.DiffResults, diffResult)
}
var filteredResults []v1.CacheResult
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this removes the results that were converted to diffresults.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sipsma sipsma marked this pull request as ready for review January 11, 2022 21:13
This change enables inline cache to work as expected with MergeOp by
supporting a new type of result, DiffResult, which enables results to be
specified as a specific ordered set of layers inside an image.
Previously, results could only be specified with a singe layer index,
which meant that they had to start at the image's base layer and end at
that index. That meant that merge inputs couldn't be specified as they
are often a subset of the image layers that don't begin at the base.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@tonistiigi tonistiigi merged commit cfdaaa4 into moby:master Jan 16, 2022
@tonistiigi
Copy link
Member

I discovered a case that doesn't work very well with inline cache atm but it isn't directly related to this PR. We don't export cache for the roots atm. The thinking behind that was that if base image directly matches we might as well just pull it from upstream, not from cache. We also don't want to pull git sources from cache and instead should do a fresh clone. And of course caching the local context is just useless(there is another protection against that). Not caching images is actually maybe not that correct. There is issue where people are confused why the base image is not pulled from local directory with --cache-from type=local.

But related to inline cache and mergeop the interesting case is as follows.

llb.Merge([]llb.State{base, llb.Scratch().File(llb.Mkdir(),llb.Mkfile(),..), anotherstate})

Because llb.Scratch().File only makes a single layer with no inputs then if you look at the inline cache output it doesn't have a value for chains. So everything around it matches cache, including full mergeop, but if one of the inputs of mergeop gets invalidated then the llb.Scratch()... gets invalidates as well. That means that it will generate new blobs because timestamps change etc.

Even in the slightly different case:

llb.Merge([]llb.State{base, llb.Scratch().File(llb.Mkdir()).File(llb.Mkfile(),..), anotherstate})

The issue doesn't appear anymore (unless Mkfile() changes) as now there are chain values for the Mkfile part.

@crazy-max crazy-max added this to the v0.10.0 milestone Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants