Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Include individual resource sync errors in Sync events #970

Merged
merged 3 commits into from
Mar 12, 2018

Conversation

squaremo
Copy link
Member

@squaremo squaremo commented Mar 1, 2018

A sync can be a partial success, in that we skip lightly over individual resources failing to be applied. This means that bogus manifests don't get noticed, and the symptoms can be difficult to pin down if you are used to thinking cluster == git.

This PR surfaces those errors by including them in the Sync event, as described in #756. (It fixes #756, in fact.)

There are other measures we could take, e.g., annotating offending commits (maybe? we may not be able to pin it down); keeping a gauge of sync failures which can be alerted on. But this is a fair start.

cluster/sync.go Outdated
Error error
}

type SyncErrors []SyncError

This comment was marked as abuse.

This comment was marked as abuse.

event/event.go Outdated
@@ -204,6 +204,8 @@ type SyncEventMetadata struct {
// policy changes, and "other" (meaning things we didn't commit
// ourselves)
Includes map[string]bool `json:"includes,omitempty"`
// Per-resource errors
Errors map[string]string `json:"errors,omitempty"`

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@@ -21,7 +21,7 @@ type Manifests interface {
// spec given.
UpdateDefinition(def []byte, container string, newImageID image.Ref) ([]byte, error)
// Load all the resource manifests under the path given
LoadManifests(paths ...string) (map[string]resource.Resource, error)
LoadManifests(base, first string, rest ...string) (map[string]resource.Resource, error)

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@squaremo squaremo force-pushed the feature/756-include-sync-errors branch 2 times, most recently from 93bfd7c to a141155 Compare March 6, 2018 18:38
@squaremo
Copy link
Member Author

squaremo commented Mar 7, 2018

PTAL @Sambooo

squaremo and others added 3 commits March 9, 2018 12:46
Before we had an interface representing resources, they were generally
passed around as a pair of (id string, definition []byte). We can cut
down on the number of representations of resources by just using
`resource.Resource`.

This includes returning synchronisation errors with the whole
resource, rather than simply by name. Doing so means we will be able
to better surface sync problems with individual resources.
Include an error report in the sync notification, with resources
v. errors (with the latter coming from `kubectl` most likely).

The most useful bit of information when a resource fails to sync --
more useful than the error from kubectl, even -- is the file that had
a problem. Include that in the notification.

Secondarily: to avoid having a long, tmpfile path in messages, make
the source of resources relative to the repo.
Instead of using a map, which forces us to give a glommed-together
string as the key (so that it can be JSONed), use a slice of structs
with the unglommed data.
@squaremo squaremo force-pushed the feature/756-include-sync-errors branch from a141155 to cdca842 Compare March 9, 2018 14:30
@squaremo squaremo merged commit 1a860b1 into master Mar 12, 2018
@squaremo squaremo deleted the feature/756-include-sync-errors branch March 12, 2018 12:15
squaremo added a commit to weaveworks/service that referenced this pull request Mar 12, 2018
Newer flux daemons (since fluxcd/flux#970) report any sync
failures for individual resources. Take advantage of this by including
the errors in sync notifications.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include errors in sync events
2 participants