Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Use kubeyaml for YAML updates #976

Merged
merged 7 commits into from
May 30, 2018
Merged

Use kubeyaml for YAML updates #976

merged 7 commits into from
May 30, 2018

Conversation

squaremo
Copy link
Member

@squaremo squaremo commented Mar 5, 2018

squaremo/kubeyaml uses a round-tripping YAML parser, so can do a more reliable job at updating YAMLs. By using that, we can divest fluxd of lots of regexp use, and gain the ability to support List and multidoc YAMLs almost for free.

We still have to get the kubeyaml executable files available to fluxd. The kubeyaml image happens to be (* I made it) based on alpine, so the binary will run in fluxd's container. In principle, we could just base fluxd on the kubeyaml file, but this is the wrong way around for them to compose. Instead, I use an implicit multistage Dockerfile to copy the files from the kubeyaml image.

Because kubeyaml parses then serialises whole files, it will tend to reformat things. Once this is merged, we will see a few commits that have incidental changes, like reindented list items, or occasional changes in quoting.

There are a few corner cases* where this mangles a file because of bugs in the underlying YAML parser. Happily, we now verify that updates are correct (thanks to #1081), so releases that hit those bugs will fail rather than committing garbage. (*way fewer than in the regexp-based code! to be clear)

I have left the change of []byte to io.Reader at Sam's suggestion for another PR. Likewise, porting to a more general yamling library (i.e., py-yamler or something very close), so that we can do this incrementally -- kubeyaml is good enough for us to take this step.

return execKubeyaml(in, args)
}

func execKubeyaml(in []byte, args []string) ([]byte, error) {

This comment was marked as abuse.

This comment was marked as abuse.

@squaremo squaremo changed the title [WIP] Use kubeyaml for YAML updates [RFC] Use kubeyaml for YAML updates Mar 7, 2018
return execKubeyaml(in, args)
}

func execKubeyaml(in []byte, args []string) ([]byte, error) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@squaremo
Copy link
Member Author

squaremo commented Mar 8, 2018

IRL conversation: we could give kubeyaml a server mode, since it only needs to answer simple questions and doesn't need a sophisticated API or anything. That would let us run it as a sidecar, rather than an initContainer, removing some of the ickiness with $PATH.

@squaremo
Copy link
Member Author

squaremo commented Mar 9, 2018

we could give kubeyaml a server mode, since it only needs to answer simple questions and doesn't need a sophisticated API or anything

Though this would make testing a bit harder.

@samb1729
Copy link
Contributor

samb1729 commented Mar 9, 2018

Should unit tests depend on kubeyaml at all, whether it's via os.Exec or http.Client?

We could:

  1. Create a kubeyaml mock and test that flux tells it the right changes to make,
  2. Test that kubeyaml itself makes the right changes,
  3. (optional) Integration test it all for additional confidence

@squaremo
Copy link
Member Author

Just noticed a problem. When we're updating then container-specific tag filters, we parse the configs. But while kubeyaml happily deals with Lists and multidoc files, that specific bit of parsing doesn't. I can think of a couple of fixen:

  • be able to parse Lists and multidocs in that place; or, more likely, adapt the code in cluster/kubernetes/resources so we can extract the list of containers using it.
  • add a feature to kubeyaml so it will do the container enumeration when updating;
  • change how these annotations work so that the value can always be replaced without knowledge of the specific containers

@@ -18,6 +18,10 @@ WORKDIR /home/flux
ENTRYPOINT [ "/sbin/tini", "--", "fluxd" ]
RUN apk add --no-cache openssh ca-certificates tini 'git>=2.3.0'

# Get the kubeyaml binary (files) and put them on the path
COPY --from=quay.io/squaremo/kubeyaml:0.1.0 /usr/lib/kubeyaml /usr/lib/kubeyaml/

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

squaremo and others added 6 commits May 25, 2018 11:33
This wipes out a bunch of regexp-(ab)using code, in favour of using a
self-contained binary from an image, which is expected to be on the
`$PATH`.

For the tests, we have to have a `kubectl` in the `$PATH`. A small
shell script shim suffices.

To make the binary available in a pod, the kubeyaml image can be
called as an initContainer, copying the files to a shared volume.
While kubeyaml does a good job of preserving comments and mostly a
good job or preserving whitespace, it does end up reindenting blocks,
and requoting string literals. Since these are still
meaning-preserving changes, remove them from consideration by cooking
the test cases a little.
It's not enough to be able to update images in a List resource, we
must be able to recognise them in the first place.

The Go YAML parser makes this a little tricky, because it doesn't give
you access to the bytes (like `UnmarshalJSON` does), so if we want
typed resources, we have to parse them generically and marshal them
back out to bytes.
kubeyaml (now) supports updating annotations as well as images, with
the same round-tripping properties.

One wrinkle in the test code is that it relies on maps being sorted
lexicographically when interpolated into templates. However, kubeyaml
keeps the order in which things were added. Rather than add a lot of
complication in the comparison of expected v. actual result, I have
made the order explicit by using slices instead of maps.

We can get rid of some duplication by using the resources package to
parse manifests and thereby get policies.
To be able to use kubeyaml from within a pod, fluxd must have access
to the binaries. We can simply build the files into the image using a
multistage Dockerfile, and make sure they are on the `$PATH`.

I took the opportunity to factor out the code that executes
`kubeyaml`.
@squaremo squaremo changed the title [RFC] Use kubeyaml for YAML updates Use kubeyaml for YAML updates May 25, 2018
@squaremo squaremo requested a review from aaron7 May 25, 2018 11:06
 - We have many fewer requirements of YAMLs now
 - While we're documenting limitations: kubeyaml 0.2.1 preserves
   quotes, and thereby produces fewer spurious diffs, so update to
   that version rather than having to mention it
Copy link
Contributor

@awh awh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Nicely organised PR too ✨

@@ -149,7 +149,6 @@ spec:
metadata:
labels:
name: fluxy
version: master-a000001

This comment was marked as abuse.

This comment was marked as abuse.

@@ -194,7 +193,6 @@ spec:
metadata:
labels:
name: fluxy
version: "1234567"

This comment was marked as abuse.

return execKubeyaml(in, args)
}

func execKubeyaml(in []byte, args []string) ([]byte, error) {

This comment was marked as abuse.

@squaremo squaremo merged commit fd67156 into master May 30, 2018
@squaremo squaremo deleted the use-kubeyaml branch May 30, 2018 14:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants