Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardise logging #172

Closed
dholbach opened this issue Oct 27, 2021 · 10 comments
Closed

Standardise logging #172

dholbach opened this issue Oct 27, 2021 · 10 comments
Assignees

Comments

@dholbach
Copy link
Member

From Ada Logics:

Proper logging is important to ensure audit trails in case of breaches. However, throughout the code we found inconsistency in the way logging is handled, and often when errors occur there would be no error logging.

The ImageAutomationReconciler declares the following failWithError function:

https://github.com/fluxcd/image-automation-controller/blob/674f833d982628831c101a919c0154c64c455db3/controllers/imageupdateautomation_controller.go#L149-L157

This function is consistently used to return from the Reconcile function in an appropriate manner with proper logging, e.g:

https://github.com/fluxcd/image-automation-controller/blob/674f833d982628831c101a919c0154c64c455db3/controllers/imageupdateautomation_controller.go#L224-L227

This is a good way of abstracting of common logic and also ensures logging However, this theme is never used in any of the other controllers and these controllers implement the logic quite differently, e.g.

https://github.com/fluxcd/source-controller/blob/d7afc3596bdfc3818ed8987db029bea8a461c62c/controllers/gitrepository_controller.go#L175-L200

https://github.com/fluxcd/image-reflector-controller/blob/327a6ea9fd78783e4daaa2da3cc1f40dbcca0cab/controllers/imagepolicy_controller.go#L118-L131

https://github.com/fluxcd/notification-controller/blob/8ff5f75a255d7dd0677590510ab1abf1d33b4f85/controllers/alert_controller.go#L72-L79

Recommendation
● Standardise how logging should occur.
● Use helper methods for common error handling.
● Log whenever errors occur.
● Log differently depending on how each controller exits the Reconcile functions.

@Nalum
Copy link
Member

Nalum commented Jan 5, 2022

I'd be happy to take a look at this.

Is the helper function in the ImageAutomationReconciler (with tweaking where required) the template to use for each of the others?

@Nalum
Copy link
Member

Nalum commented Jan 10, 2022

Base on the failWithError function in the Image Automation Controller I was going to build the function in a similar way for this controller e.g.

        failWithError := func(err error) (ctrl.Result, error) {
                r.event(ctx, repository, events.EventSeverityError, err.Error())
                repository = sourcev1.GitRepositoryNotReady(repository, meta.DependencyNotReadyReason, err.Error())
                if err := r.updateStatus(ctx, req, repository.Status); err != nil {
                        log.Error(err, "failed to update status")
                }
                return ctrl.Result{Requeue: true}, err
        }

Does this look right for this controller? What I'm mostly wondering about is the r.updateStatus call as that is mostly where failWithError is being placed (based on the Image Automation Controller) e.g.

        // set initial status
        if resetRepository, ok := r.resetStatus(repository); ok {
                repository = resetRepository
                if err := r.updateStatus(ctx, req, repository.Status); err != nil {
                        log.Error(err, "unable to update status")
                        return failWithError(err) // ctrl.Result{Requeue: true}, err
                }
                r.recordReadiness(ctx, repository)
        }
        // update status with the reconciliation result
        if err := r.updateStatus(ctx, req, reconciledRepository.Status); err != nil {
                log.Error(err, "unable to update status")
                return failWithError(err) // ctrl.Result{Requeue: true}, err
        }

Should it be updated to also accept ctrl.Result as a param? It could then be used for most returns not just where Requeue: true.

@Nalum
Copy link
Member

Nalum commented Jan 11, 2022

Should sourcev1.GitRepositoryNotReady be called in this function as well?

Currently in the Reconcile function it's called in one place only if r.checkDependencies returns an error.

@pjbgf
Copy link
Member

pjbgf commented Jan 11, 2022

@Nalum thank you for the initial look at this. I am not sure whether some decisions have been made around logging as part of the refactoring exercise, if so that could help us move this forward.

@darkowlzz @hiddeco is there any new standard around logging in the refactoring that we could be beneficial here? Otherwise, any recommendation on this front?

@hiddeco hiddeco self-assigned this Jan 11, 2022
@dholbach
Copy link
Member Author

dholbach commented Feb 1, 2022

Shall we discuss this in one of the next Dev meetings maybe?

@stefanprodan
Copy link
Member

Using failWithError does not mean standardised logging, we do use Zap for structured logging and all controllers follow the same logging pattern. In most cases failWithError can't be used as such because failure conditions differ.

@Nalum
Copy link
Member

Nalum commented Feb 1, 2022

@dholbach I had a quick chat with @hiddeco about this shortly after making the comments. @hiddeco and @darkowlzz have been discussing and wanted to do some further discussion before jumping into it.

I'm happy to help out when they are ready to share.

@darkowlzz
Copy link
Contributor

darkowlzz commented Feb 2, 2022

Hi, after spending some time thinking about this and experimenting with some options, we do have a recommendation for logging in a consistent way as part of the rewrite/refactor. We've a document that describes the new changes in the controller structure which has a section about contextual errors with examples and how to handle logging and event emitting for them, refer https://gist.github.com/darkowlzz/02cbdeaa3e5d05bdc620515a66311684#event-error . A reference implementation of it is in the source-controller at the moment.

In short, the recommendation is to define contextual errors and handle then at the very end of the reconciliation in a deferred function, in RecordContextualError() function as per the above document. Some errors that aren't runtime errors have to be explicitly logged and event emitted for them, others that are runtime errors are automatically logged by the runtime, so such errors only need to have explicit event emitting. Consolidating these at the very end of reconciliation helps ensure that all the errors are logged and events are emitted for them in a consistent way.

For non-error logging, the same document has a section about Logging and emitting events that recommends ways of recording informational messages in a unified way with some examples.

I believe these would cover most of the needs for standardization. But it may be difficult to implement these without overall refactor of the controllers as described in the above document, like in the case of source-controller.

Happy to discuss more about these recommendations and concerns around it, or anything that we may be missing out.

@aryan9600
Copy link
Member

Status update: the issue with image-reflector-controller has been addressed via fluxcd/image-reflector-controller#232.

@pjbgf
Copy link
Member

pjbgf commented Feb 16, 2022

The new logging standard has been defined and documented as per comment above. It takes into consideration that some controllers have slightly different requirements, but it also ensures that all errors are consistently logged once.

All the occurrences pointed out by the audit have been refactored since then, and are inline with the new standard.

The new logging standard is part of a larger controller rewrite exercise in which all Flux's controllers are being refactored in order to improve their testability, maintainability and the overall consistency across the code bases. This is an ongoing exercise which will be tracked separately.

Due to the points above, this can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants