Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway: ensure llb digests are deterministic when sent by frontends #5517

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

jsternberg
Copy link
Collaborator

This ensures different valid protobuf serializations that are sent by frontends will be rewritten into digests that are normalized for the buildkit solver.

The most recent example of this is that older frontends would generate protobuf with gogo and the newer buildkit is using the google protobuf library. These produce different serializations and cause the solver to think that identical operations are actually different.

This is done by rewriting the incoming definition sent by the llb bridge forwarder when a gateway calls solve with a protobuf definition.

@jsternberg
Copy link
Collaborator Author

We did some more digging and the byte difference between the two is related to field order. Google's protobuf library will marshal the fields based on this code snippet: https://github.com/protocolbuffers/protobuf-go/blob/b98563540c0a4edb38526bcd6e6c97f9fac1f453/internal/order/order.go#L21-L41

It automatically puts any oneof field after any non-oneof field. The gogo ordering seems to just be numerical. There's a oneof field in field 3 and there's a normal field in field 11 which is ordered differently and causes the bytes to be different.

@@ -760,6 +760,17 @@ func (lbf *llbBridgeForwarder) Solve(ctx context.Context, req *pb.SolveRequest)
}
}

if req.Definition != nil {
// Rewrite digests in the definition. This ensures the digests are validated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in gateway and not in solver loader where we already have ops remarshal logic for source policies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the logic there. I think the logic also needs to be slightly different. The current one has a kind of conditional rewrite and we'll just need to always remarshal it.

@jsternberg jsternberg force-pushed the deterministic-llb-for-gateway branch from 8c41ea7 to ea52c76 Compare November 14, 2024 16:37
@jsternberg jsternberg added this to the v0.18.0 milestone Nov 14, 2024
@@ -67,7 +67,7 @@ func (m *DiffOp) Marshal(ctx context.Context, constraints *Constraints) (digest.

proto.Op = &pb.Op_Diff{Diff: op}

dt, err := deterministicMarshal(proto)
dt, err := proto.Marshal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these changes in marshal? Does it still call the deterministic internally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still deterministic but I was defining and calling it in multiple packages. I also figured that having any marshal function that calls MarshalVT when we know that's an improper way to marshal the data would not be a good idea.

To be fair, I made this change when I was adding a third location where we were doing this so I had a bigger interest in removing the duplication. It's now back to two locations.

I also debated renaming the function to MarshalDeterministic() but then I had to change a few call sites and I didn't think it was worth it.


dm.mapping[dgst] = newDgst
if dgst != newDgst {
// Ensure the indices also map to the new digest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks bit weird. Why do these containers mix old and new digests?

It should be that there is a container to check if digest has been converted already and then lookup by the old digest to op if it has not. I think this is also needed to avoid loops as otherwise loops should be possible by mixing old and new digests in the definition.

}

index := dm.indexByDigest[dgst]
dm.out.Def[index] = data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? We have already parsed the data once. If we write back the encoded data that means we need to parse it again. We should only need to marshal to get the new digest, don't actually need the new bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'll go back and see if this is an easier way of doing it. I think you're likely right. I'll give it a try.

@tonistiigi tonistiigi requested a review from cpuguy83 November 14, 2024 19:04
@jsternberg jsternberg force-pushed the deterministic-llb-for-gateway branch from ea52c76 to 6081ae3 Compare November 14, 2024 19:38
This ensures different valid protobuf serializations that are sent by
frontends will be rewritten into digests that are normalized for the
buildkit solver.

The most recent example of this is that older frontends would generate
protobuf with gogo and the newer buildkit is using the google protobuf
library. These produce different serializations and cause the solver to
think that identical operations are actually different.

Signed-off-by: Jonathan A. Sternberg <jonathan.sternberg@docker.com>
@jsternberg
Copy link
Collaborator Author

Modified this PR to be a bit more faithful to the original code. The code to detect if a mutation happened has now been removed in favor of unconditional mutation, but the rest of it remains faithful to the original so it won't repeat deserialization. I've also added a testdata file with the gogo protobuf serialization to test that things get recomputed correctly.

@tonistiigi
Copy link
Member

@cpuguy83 PTAL

@cpuguy83
Copy link
Member

I see we always need to recompute all ops now, that's unfortunate but understandable.

@tonistiigi tonistiigi merged commit 16d3799 into moby:master Nov 21, 2024
92 checks passed
@jsternberg jsternberg deleted the deterministic-llb-for-gateway branch November 21, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants