Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input digest tagger #5192

Merged
merged 10 commits into from
Apr 7, 2021

Conversation

piotr-szybicki
Copy link
Contributor

@piotr-szybicki piotr-szybicki commented Dec 28, 2020

Implementation of inputDigest tagger from the Improve taggers proposal This PR is much bigger than I originally intended. But whichever way I looked I was hitting the circular dependencies. So the first 2 commits are about extracting build.ArtifactGraph and build.Artifact into its own package.

It does mostly what internal skaffold hasher is doing a minus couple of things:

  • Doesn't hash the build arguments (I don't believe they are not relevant. )
  • Doesn't hash test dependencies (I don't believe they are not relevant. )

Some areas to think of (maybe in this or in future PR's):

  • following target in multi-stage docker builds
  • including image cotainerId in the tag. (But to get there the implementation of docker.GetDependencies would have to change including its contract of the method)
  • documentation what exactly is being used in the tag calculation (how only COPY and ADD commands can influence tag)

@codecov
Copy link

codecov bot commented Dec 28, 2020

Codecov Report

Merging #5192 (4a6b4b2) into master (665b502) will decrease coverage by 0.16%.
The diff coverage is 53.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5192      +/-   ##
==========================================
- Coverage   70.68%   70.51%   -0.17%     
==========================================
  Files         408      410       +2     
  Lines       15581    15624      +43     
==========================================
+ Hits        11014    11018       +4     
- Misses       3757     3792      +35     
- Partials      810      814       +4     
Impacted Files Coverage Δ
cmd/skaffold/app/cmd/deploy.go 56.52% <0.00%> (ø)
cmd/skaffold/app/cmd/filter.go 25.58% <ø> (ø)
cmd/skaffold/app/cmd/render.go 46.42% <0.00%> (ø)
cmd/skaffold/app/flags/build_output.go 85.71% <ø> (ø)
pkg/skaffold/build/build.go 0.00% <ø> (-85.72%) ⬇️
pkg/skaffold/build/builder_mux.go 34.37% <ø> (ø)
pkg/skaffold/build/cache/cache.go 52.00% <ø> (ø)
pkg/skaffold/build/cache/hash.go 71.28% <ø> (ø)
pkg/skaffold/build/cache/types.go 100.00% <ø> (ø)
pkg/skaffold/build/gcb/types.go 53.84% <ø> (ø)
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 665b502...4a6b4b2. Read the comment docs.

@briandealwis
Copy link
Member

Sorry for the delay in reviewing this @piotr-szybicki. I'll try to look today — this particular issue has been on my wishlist for a bit.

Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @piotr-szybicki — I must apologize profusely for taking so long to review this PR. It generally looks great and is exactly what I was hoping for.

The one big quibble is the name of pkg/skaffold/build/dep. The team talked about this and we'd like to move this package up in the hierarchy and call it graph (so pkg/skaffold/graph).

If you're sick of this PR, I'm happy to take this on.

cmd/skaffold/app/cmd/run.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest_test.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest_test.go Outdated Show resolved Hide resolved
return "", err
}
h.Write([]byte(fi.Mode().String()))
h.Write([]byte(fi.Name()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should be a file-path relative to the artifact workspace, and it should use filepath.ToPath() to ensure it's encoded consistently between Windows and *nix.

Copy link
Contributor Author

@piotr-szybicki piotr-szybicki Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is no filepath.ToPath() function I used filepath.Clean() that seems to do the job. Also, Do you think I should change it in the hash.fileHasher()? For consistency.

if err != nil {
return "", err
}
h.Write([]byte(fi.Mode().String()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're hashing inputs, I don't think the mode matters since that will be affected by my OS and local umask settings. And the builder itself may change these modes as it's writing files into the layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely. Do you think I should also remove it from the hash.fileHasher()? Feels like the internal scaffold cache and the tagger should be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh, never mind, they will never be consistent. I just took a look at the hash.go code and see that you are also hashing the build arguments. Totally make sense if you consider the development life cycle. Something that I don't think is appropriate to be a part of the image hash (Might be wrong on that one). I mean the way I see it we want to be close to what docker is doing with it's layer.DiffID.

pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@piotr-szybicki piotr-szybicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @piotr-szybicki — I must apologize profusely for taking so long to review this PR. It generally looks great and is exactly what I was hoping for.

The one big quibble is the name of pkg/skaffold/build/dep. The team talked about this and we'd like to move this package up in the hierarchy and call it graph (so pkg/skaffold/graph).

If you're sick of this PR, I'm happy to take this on.

Hi, no problem has been busy with other stuff myself and didn't chase. I looked over the comments and they make total sense I will implement all the changes you requested. I will also add some documentation as generally understanding how this works is not that trivial. As I found out. It's different for docker comparing to Bazel or jib.

Makefile Outdated
@@ -14,7 +14,7 @@
GOPATH ?= $(shell go env GOPATH)
GOOS ?= $(shell go env GOOS)
GOARCH ?= $(shell go env GOARCH)
BUILD_DIR ?= ./out
BUILD_DIR ?= /home/piotrszybicki/work/cube
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this, it's removed in the following commit.

@piotr-szybicki piotr-szybicki force-pushed the InputDigestTagger branch 4 times, most recently from 663ebbe to bb37e68 Compare March 7, 2021 15:15
Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @piotr-szybicki! I've put in a few comments. If we can get these addressed, then it'll be ready for commit.

#5492 (and #5504) brought to light that we need to incorporate more than just the inputs — and this input tagger should use the artifact-hashing mechanism that's used elsewhere in Skaffold. Unfortunately this artifact-hasher is not yet exposed in an easy-to-consume manner (mechanism, lister). This is a bigger change and can be done separately if you don't want to take it on (just let me know).

cmd/skaffold/app/cmd/run.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/input_digest.go Outdated Show resolved Hide resolved
inputs = append(inputs, h)
}

return encode(inputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just inline the sha256 encoding of the file path and file contents directly into the for{} loop? inputs isn't used for any purpose and just adds memory pressure. Some artifacts — with their dependencies — may have hundreds or thousands of files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have very limited understanding of how this encoder works, but the json.Encoder that's being used takes the entire set of files and returns a single encoding based on that set. is it possible to encode each file individually and then combine the encodings to achieve the same result?

it's worth noting that the elements of inputs are the hashes of each file and its contents, not the entire file contents themselves. so while there could be thousands of files, this would be a set of thousands of hashes, each of which is only a few bytes.

Copy link
Contributor Author

@piotr-szybicki piotr-szybicki Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input slice contains only a file paths, I iterate over them in the encode method hashing as you said content and relative path. Maybe the name is misleading.

pkg/skaffold/build/tag/tag.go Outdated Show resolved Hide resolved
pkg/skaffold/build/tag/tag.go Outdated Show resolved Hide resolved
@nkubala
Copy link
Contributor

nkubala commented Mar 16, 2021

@piotr-szybicki I took the liberty of rebasing your PR (I made some changes that made the merge conflicts quite annoying for you) - I hope this is ok!

I also addressed a few of @briandealwis's last comments - mostly formatting and cleaning suggestions, though I made one change to the arguments to GenerateTag. please let me know if all of these changes are ok, and if so (pending @briandealwis's final review) we'll merge this!

@piotr-szybicki
Copy link
Contributor Author

@nkubala this is absolutely fine. Thanks for doing it. I was about to do this myself but my full time job is killing me lately.

@piotr-szybicki
Copy link
Contributor Author

Thanks @piotr-szybicki! I've put in a few comments. If we can get these addressed, then it'll be ready for commit.

#5492 (and #5504) brought to light that we need to incorporate more than just the inputs — and this input tagger should use the artifact-hashing mechanism that's used elsewhere in Skaffold. Unfortunately this artifact-hasher is not yet exposed in an easy-to-consume manner (mechanism, lister). This is a bigger change and can be done separately if you don't want to take it on (just let me know).

@briandealwis Sorry for the late response. But a year of my work is coming to a culmination release over coming weekend. I actually knew about this issue If you take a look at the original comment I made when creating this PR
including image cotainerId in the tag. (But to get there the implementation of docker.GetDependencies would have to change including its contract of the method)
I have a fix on a branch so I will create a different PR to address this particular issue. Because this one got big enough as it is.

Copy link
Member

@briandealwis briandealwis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the InputDigest type is fixed up

docs/content/en/schemas/v2beta13.json Outdated Show resolved Hide resolved
pkg/skaffold/kubernetes/owner_test.go Outdated Show resolved Hide resolved
pkg/skaffold/schema/latest/config.go Outdated Show resolved Hide resolved
pkg/skaffold/tag/tag.go Outdated Show resolved Hide resolved
pkg/skaffold/tag/input_digest_test.go Outdated Show resolved Hide resolved
@piotr-szybicki piotr-szybicki force-pushed the InputDigestTagger branch 4 times, most recently from 19ff291 to 75e30c9 Compare March 17, 2021 22:39
@piotr-szybicki
Copy link
Contributor Author

I fix all the comments, and tests. I of the Travis CI Jobs has flaked out on me. Can anyone rerun?

@nkubala nkubala force-pushed the InputDigestTagger branch 2 times, most recently from 2951d84 to 2c44e0e Compare March 31, 2021 17:21
@tejal29
Copy link
Member

tejal29 commented Apr 6, 2021

@piotr-szybicki sorry for the delay. do you have some cycles to rebase your PR. I would love to merge this.

@nkubala
Copy link
Contributor

nkubala commented Apr 6, 2021

@tejal29 i'm on it :)

@nkubala
Copy link
Contributor

nkubala commented Apr 7, 2021

all merge conflicts resolved and CI finally showing all green, so I'm going to merge this! @piotr-szybicki thanks so much for the contribution and all of your patience!

@nkubala nkubala merged commit b496731 into GoogleContainerTools:master Apr 7, 2021
@mikekamornikov
Copy link
Contributor

@nkubala As I see InputDigest is not supported by CustomTemplateTagger (see CreateComponents in pkg/skaffold/tag/tagger_mux.go):

apiVersion: skaffold/v2beta14
kind: Config
profiles:
- name: test-custom-template
  build:
    tagPolicy:
      customTemplate:
        template: "release-aaa-{{.DIGEST}}"
        components:
        - name: DIGEST
          inputDigest: {}
    artifacts:
    - image: quay.io/my/project
      context: ../../..
      bazel:
        target: //project:docker.tar
$ skaffold build -p test-custom-template
creating runner: creating tagger: creating tagger: creating components: unknown component for custom template: DIGEST {GitTagger:<nil> ShaTagger:<nil> EnvTemplateTagger:<nil> DateTimeTagger:<nil> CustomTemplateTagger:<nil> InputDigest:0x7a479b8}
. If above error is unexpected, please open an issue https://github.com/GoogleContainerTools/skaffold/issues/new to report this error.

Is it by design? Should I file a bug?

@piotr-szybicki
Copy link
Contributor Author

I just reread improve digest proposal. And yes I clearly missed {{.DIGEST}} when implementing. So filing a bug would be a good choice. I will find some time during the weekend to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants