Skip to content

Commit

Permalink
Remove git caching in favor of file:// (#1398)
Browse files Browse the repository at this point in the history
This removes git-caching behavior from Zarf in favor of folks
controlling their repos manually with `file://` now that that is
supported.

Fixes #1362

- [X] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

- [X] Test, docs, adr added or updated as needed
- [X] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/CONTRIBUTING.md#developer-workflow)
followed
  • Loading branch information
Racer159 authored and Noxsios committed Mar 8, 2023
1 parent a7dc8de commit 858e847
Show file tree
Hide file tree
Showing 11 changed files with 126 additions and 186 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,4 @@ zarf
zarf-pki
zarf-sbom/
*.part*
test-*.txt
53 changes: 49 additions & 4 deletions docs/9-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,19 @@ No, the Zarf binary and init package can be downloaded from the [Releases Page](

## What dependencies does Zarf have?

Zarf is statically compiled and written in [Go](https://golang.org/) and [Rust](https://www.rust-lang.org/), so it has no external dependencies. For Linux, Zarf can bring a Kubernetes cluster using [K3s](https://k3s.io/). For Mac and Windows, Zarf can leverage any available local or remote cluster the user has access to. Currently, the K3s installation Zarf performs does require a [Systemd](https://en.wikipedia.org/wiki/Systemd) based system and root access.
Zarf is statically compiled and written in [Go](https://golang.org/) and [Rust](https://www.rust-lang.org/), so it has no external dependencies. For Linux, Zarf can bring a Kubernetes cluster using [K3s](https://k3s.io/). For Mac and Windows, Zarf can leverage any available local or remote cluster the user has access to. Currently, the K3s installation Zarf performs does require a [Systemd](https://en.wikipedia.org/wiki/Systemd) based system and `root` (not just `sudo`) access.

## What license is Zarf under?

Zarf is under the [Apache License 2.0](https://github.com/defenseunicorns/zarf/blob/main/LICENSE). This is one of the most commonly used licenses for open source software.

## What is the Zarf Agent?

The Zarf Agent is a [Kubernetes Mutating Webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that is installed into the cluster during the `zarf init` operation. The Agent is responsible for modifying [Kubernetes PodSpec](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec) objects [Image](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container.Image) fields to point to the Zarf Registry. This allows the cluster to pull images from the Zarf Registry instead of the internet without having to modify the original image references. The Agent also modifies [Flux GitRepository](https://fluxcd.io/docs/components/source/gitrepositories/) objects to point to the local Git Server.
The Zarf Agent is a [Kubernetes Mutating Webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) that is installed into the cluster during `zarf init`. The Agent is responsible for modifying [Kubernetes PodSpec](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec) objects [Image](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container.Image) fields to point to the Zarf Registry. This allows the cluster to pull images from the Zarf Registry instead of the internet without having to modify the original image references. The Agent also modifies [Flux GitRepository](https://fluxcd.io/docs/components/source/gitrepositories/) objects to point to the local Git Server.

## Why doesn't the Zarf Agent create secrets it needs in the cluster?

During early discussions and [subsequent decision](../adr/0005-mutating-webhook.md) to use a Mutating Webhook, we decided to not have the Agent create any secrets in the cluster. This is to avoid the Agent having to have more privileges than it needs as well as avoid collisions with Helm. The Agent today simply responds to requests to patch PodSpec and GitRepository objects.
During early discussions and [subsequent decision](../adr/0005-mutating-webhook.md) to use a Mutating Webhook, we decided to not have the Agent create any secrets in the cluster. This is to avoid the Agent having to have more privileges than it needs as well as to avoid collisions with Helm. The Agent today simply responds to requests to patch PodSpec and GitRepository objects.

The Agent does not need to create any secrets in the cluster. Instead, during `zarf init` and `zarf package deploy`, secrets are automatically created as [Helm Postrender Hook](https://helm.sh/docs/topics/advanced/#post-rendering) for any namespaces Zarf sees. If you have resources managed by [Flux](https://fluxcd.io/) that are not in a namespace managed by Zarf, you can either create the secrets manually or include a manifest to create the namespace in your package and let Zarf create the secrets for you.

Expand All @@ -30,7 +30,7 @@ Resources can be excluded at the namespace or resources level by adding the `zar

During the `zarf init` operation, the Zarf Agent will patch any existing namespaces with the `zarf.dev/agent: ignore` label to prevent the Agent from modifying any resources in that namespace. This is done because there is no way to guarantee the images used by pods in existing namespaces are available in the Zarf Registry.

## How can I improve the speed of loading larges images from Docker on `zarf package create`?
## How can I improve the speed of loading large images from Docker on `zarf package create`?

Due to some limitations with how Docker provides access to local image layers, `zarf package create` has to rely on `docker save` under the hood which is [very slow overall](https://github.com/defenseunicorns/zarf/issues/1214) and also takes a long time to report progress. We experimented with many ways to improve this, but for now recommend leveraging a local docker registry to speed up the process. This can be done by running a local registry and pushing the images to it before running `zarf package create`. This will allow `zarf package create` to pull the images from the local registry instead of Docker. This can also be combined with [component actions](4-user-guide/5-component-actions.md) to make the process automatic. Given an example image of `my-giant-image:###ZARF_PKG_VAR_IMG###` you could do something like this:

Expand Down Expand Up @@ -61,6 +61,51 @@ components:
- 'localhost:5000/###ZARF_PKG_VAR_IMG###'
```
## Can I pull in more than http(s) git repos on `zarf package create`?

Under the hood, Zarf uses [`go-git`](https://github.com/go-git/go-git) to perform `git` operations, but it can fallback to `git` located on the host and thus supports any of the [git protocols](https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols) available. All you need to use a different protocol is to specify the full URL for that particular repo:

:::note

In order for the fallback to work correctly you must have `git` version `2.14` or later in your path.

:::

```yaml
kind: ZarfPackageConfig
metadata:
name: repo-schemes-example
components:
repos:
- https://github.com/defenseunicorns/zarf.git
- ssh://git@github.com/defenseunicorns/zarf.git
- file:///home/zarf/workspace/zarf
- git://somegithost.com/zarf.git
```

In the airgap, Zarf with rewrite these URLs to match the scheme and host of the provided airgap `git` server.

:::note

When specifying other schemes in Zarf you must change the consuming side as well since Zarf will add a CRC hash of the URL to the repo name on the airgap side. This is to reduce the chance for collisions between repos with similar names. This means an example Flux `GitRepository` specification would look like this for the `file://` based pull:

```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
name: podinfo
namespace: flux-system
spec:
interval: 30s
ref:
tag: 6.1.6
url: file:///home/zarf/workspace/podinfo
```

:::

## What is YOLO Mode and why would I use it?

YOLO Mode is a special package metadata designation that be added to a package prior to `zarf package create` to allow the package to be installed without the need for a `zarf init` operation. In most cases this will not be used, but it can be useful for testing or for environments that manage their own registries and Git servers completely outside of Zarf. This can also be used as a way to transition slowly to using Zarf without having to do a full migration.
Expand Down
4 changes: 2 additions & 2 deletions examples/git-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ A tag-provided clone only mirrors the tag defined in the Zarf definition. The ta

:::note

If you would like to use a scheme other than http/https, you can do so with something like the following: `ssh://git@github.com/defenseunicorns/zarf.git@v0.15.0`
If you would like to use a protocol scheme other than http/https, you can do so with something like the following: `ssh://git@github.com/defenseunicorns/zarf.git@v0.15.0`. Using this you can also clone from a local repo to help you manage larger git repositories: `file:///home/zarf/workspace/zarf@v0.15.0`.

:::

## SHA-Provided Git Repository Clone

SHA-provided `git` repository cloning is another supported way of cloning repos in Zarf but is not recommended as it is less readable/understandable than tag cloning. Commit SHAs are defined using the same `scheme://host/repo@sha` format as seen in the example of the `defenseunicorns/zarf` repository (`https://github.com/defenseunicorns/zarf.git@c74e2e9626da0400e0a41e78319b3054c53a5d4e`).
SHA-provided `git` repository cloning is another supported way of cloning repos in Zarf but is not recommended as it is less readable/understandable than tag cloning. Commit SHAs are defined using the same `scheme://host/repo@shasum` format as seen in the example of the `defenseunicorns/zarf` repository (`https://github.com/defenseunicorns/zarf.git@c74e2e9626da0400e0a41e78319b3054c53a5d4e`).

A SHA-provided clone only mirrors the SHA hash defined in the Zarf definition. The SHA will be applied on the `git` mirror to the default trunk branch of the repo (i.e. `master`, `main`, or the default when the repo is cloned) as Zarf does with tagging.

Expand Down
4 changes: 0 additions & 4 deletions examples/git-data/zarf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,12 @@ components:
# Clone an azure repo that breaks in go-git and has to fall back to the host git
- https://me0515@dev.azure.com/me0515/zarf-public-test/_git/zarf-public-test



- name: specific-tag
required: true
repos:
# Do a tag-provided Git Repo mirror
- https://github.com/defenseunicorns/zarf.git@v0.15.0



- name: specific-hash
required: true
repos:
Expand Down
1 change: 0 additions & 1 deletion src/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ const (
ZarfCleanupScriptsPath = "/opt/zarf"

ZarfImageCacheDir = "images"
ZarfGitCacheDir = "repos"

ZarfYAML = "zarf.yaml"
ZarfSBOMDir = "zarf-sbom"
Expand Down
62 changes: 14 additions & 48 deletions src/internal/packager/git/pull.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@
package git

import (
"errors"
"fmt"

"path/filepath"

"github.com/defenseunicorns/zarf/src/config"
"github.com/defenseunicorns/zarf/src/pkg/message"
"github.com/defenseunicorns/zarf/src/pkg/utils"
Expand Down Expand Up @@ -51,11 +48,6 @@ func (g *Git) Pull(gitURL, targetFolder string) (path string, err error) {
func (g *Git) pull(gitURL, targetFolder string, repoName string) error {
g.Spinner.Updatef("Processing git repo %s", gitURL)

gitCachePath := targetFolder
if repoName != "" {
gitCachePath = filepath.Join(config.GetAbsCachePath(), filepath.Join(config.ZarfGitCacheDir, repoName))
}

matches := gitURLRegex.FindStringSubmatch(gitURL)
idx := gitURLRegex.SubexpIndex

Expand All @@ -64,48 +56,19 @@ func (g *Git) pull(gitURL, targetFolder string, repoName string) error {
return fmt.Errorf("unable to get extract the repoName from the url %s", gitURL)
}

alreadyProcessed := false
onlyFetchRef := matches[idx("atRef")] != ""
gitURLNoRef := fmt.Sprintf("%s%s/%s%s", matches[idx("proto")], matches[idx("hostPath")], matches[idx("repo")], matches[idx("git")])

repo, err := g.clone(gitCachePath, gitURLNoRef, onlyFetchRef)

repo, err := g.clone(targetFolder, gitURLNoRef, onlyFetchRef)
if err == git.ErrRepositoryAlreadyExists {

// Pull the latest changes from the online repo
message.Debug("Repo already cloned, pulling any upstream changes...")
gitCred := utils.FindAuthForHost(gitURL)
pullOptions := &git.PullOptions{
RemoteName: onlineRemoteName,
Auth: &gitCred.Auth,
}
worktree, err := repo.Worktree()
if err != nil {
return fmt.Errorf("unable to get the worktree for the repo (%s): %w", gitURL, err)
}
err = worktree.Pull(pullOptions)
if errors.Is(err, git.NoErrAlreadyUpToDate) {
message.Debug("Repo already up to date")
} else if err != nil {
return fmt.Errorf("not a valid git repo or unable to pull (%s): %w", gitURL, err)
}

// NOTE: Since pull doesn't pull any new tags, we need to fetch them
fetchOptions := git.FetchOptions{RemoteName: onlineRemoteName, Tags: git.AllTags}
if err := g.fetch(gitCachePath, &fetchOptions); err != nil {
return err
}

// If we enter this block, the user has specified the same repo twice in one component and we should respect the prior changes
// (see the specific-tag-update component in the git-repo-behavior test-package)
message.Debug("Repo already cloned, pulling any specified changes...")
alreadyProcessed = true
} else if err != nil {
return fmt.Errorf("not a valid git repo or unable to clone (%s): %w", gitURL, err)
}

if gitCachePath != targetFolder {
err = utils.CreatePathAndCopy(gitCachePath, targetFolder)
if err != nil {
return fmt.Errorf("unable to copy %s into %s: %#v", gitCachePath, targetFolder, err.Error())
}
}

if onlyFetchRef {
ref := matches[idx("ref")]

Expand All @@ -124,12 +87,15 @@ func (g *Git) pull(gitURL, targetFolder string, repoName string) error {
g.Spinner.Errorf(nil, "No branch found for this repo head. Ref will be pushed to 'master'.")
}

_, err = g.removeLocalTagRefs()
if err != nil {
return fmt.Errorf("unable to remove unneeded local tag refs: %w", err)
// If this repo has already been processed by Zarf don't remove tags, refs and branches
if !alreadyProcessed {
_, err = g.removeLocalTagRefs()
if err != nil {
return fmt.Errorf("unable to remove unneeded local tag refs: %w", err)
}
_, _ = g.removeLocalBranchRefs()
_, _ = g.removeOnlineRemoteRefs()
}
_, _ = g.removeLocalBranchRefs()
_, _ = g.removeOnlineRemoteRefs()

err = g.fetchRef(ref)
if err != nil {
Expand Down
17 changes: 1 addition & 16 deletions src/test/e2e/00_use_cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ func TestUseCLI(t *testing.T) {
// run `zarf package create` with a specified image cache location
cachePath := filepath.Join(os.TempDir(), ".cache-location")
imageCachePath := filepath.Join(cachePath, "images")
gitCachePath := filepath.Join(cachePath, "repos")

// run `zarf package create` with a specified tmp location
otherTmpPath := filepath.Join(os.TempDir(), "othertmp")
Expand Down Expand Up @@ -91,23 +90,9 @@ func TestUseCLI(t *testing.T) {
e2e.cleanFiles(pkgName)

files, err := os.ReadDir(imageCachePath)
require.NoError(t, err, "Error when reading image cache path")
require.NoError(t, err, "Encountered an unexpected error when reading image cache path")
assert.Greater(t, len(files), 1)

pkgName = fmt.Sprintf("zarf-package-git-data-%s-v1.0.0.tar.zst", e2e.arch)

// Pull once to test git cloning
stdOut, stdErr, err = e2e.execZarfCommand("package", "create", "examples/git-data", "--confirm", "--zarf-cache", cachePath, "--tmpdir", otherTmpPath)
require.NoError(t, err, stdOut, stdErr)

files, err = os.ReadDir(gitCachePath)
require.NoError(t, err, "Error when reading git cache path")
assert.Greater(t, len(files), 1)

// Pull twice to test git fetching (from cache)
stdOut, stdErr, err = e2e.execZarfCommand("package", "create", "examples/git-data", "--confirm", "--zarf-cache", cachePath, "--tmpdir", otherTmpPath)
require.NoError(t, err, stdOut, stdErr)

// Test removal of cache
stdOut, stdErr, err = e2e.execZarfCommand("tools", "clear-cache", "--zarf-cache", cachePath)
require.NoError(t, err, stdOut, stdErr)
Expand Down
45 changes: 0 additions & 45 deletions src/test/e2e/05_create_cache_test.go

This file was deleted.

52 changes: 52 additions & 0 deletions src/test/e2e/07_create_git_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: 2021-Present The Zarf Authors

// Package test provides e2e tests for Zarf.
package test

import (
"fmt"
"os"
"os/exec"
"path/filepath"
"testing"

"github.com/stretchr/testify/require"
)

func TestCreateGit(t *testing.T) {
extractDir := filepath.Join(os.TempDir(), ".extracted-git-pkg")

pkgDir := "src/test/test-packages/git-repo-behavior"
pkgPath := fmt.Sprintf("%s/zarf-package-git-behavior-%s.tar.zst", pkgDir, e2e.arch)
outputFlag := fmt.Sprintf("-o=%s", pkgDir)
e2e.cleanFiles(extractDir, pkgPath)

_, _, err := e2e.execZarfCommand("package", "create", pkgDir, outputFlag, "--confirm")
require.NoError(t, err, "error when building the test package")
// defer e2e.cleanFiles(pkgPath)

stdOut, stdErr, err := e2e.execZarfCommand("tools", "archiver", "decompress", pkgPath, extractDir)
require.NoError(t, err, stdOut, stdErr)
// defer e2e.cleanFiles(extractDir)

// Verify the main zarf repo only has one tag
gitDirFlag := fmt.Sprintf("--git-dir=%s/components/specific-tag/repos/zarf-1211668992/.git", extractDir)
gitTagOut, err := exec.Command("git", gitDirFlag, "tag", "-l").Output()
require.NoError(t, err)
require.Equal(t, "v0.15.0\n", string(gitTagOut))

gitHeadOut, err := exec.Command("git", gitDirFlag, "rev-parse", "HEAD").Output()
require.NoError(t, err)
require.Equal(t, "9eb207e552fe3a73a9ced064d35a9d9872dfbe6d\n", string(gitHeadOut))

// Verify the second zarf repo only has two tags
gitDirFlag = fmt.Sprintf("--git-dir=%s/components/specific-tag-update/repos/zarf-1211668992/.git", extractDir)
gitTagOut, err = exec.Command("git", gitDirFlag, "tag", "-l").Output()
require.NoError(t, err)
require.Equal(t, "v0.16.0\nv0.17.0\n", string(gitTagOut))

gitHeadOut, err = exec.Command("git", gitDirFlag, "rev-parse", "HEAD").Output()
require.NoError(t, err)
require.Equal(t, "bea100213565de1348375828e14be6e1482a67f8\n", string(gitHeadOut))
}
Loading

0 comments on commit 858e847

Please sign in to comment.