Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Bubble up the errors from sub-commands and handle errors centrally instead of using log.Fatal at multiple sites #599

Merged
merged 5 commits into from
Nov 5, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Oct 29, 2024

This is a small part of a larger effort to allow the agent to use structured logging.
This will make it easier for platform administrators to parse the logs and to distinguish errors from info messages.
It also is a small part of an effort to use contextual logging instead of the legacy log module.
This will make it easier to test the log output of our code in unit tests.

The existing code uses log.Fatal at the site of various errors to both print the error and to exit the process with code 1.
Instead the errors should be returned and bubbled back to the root command, which makes it easier to handle the errors in a consistent fashion and makes it easier to test the code that had previously caused the process to exit.

$ git grep -i '\.fatal' | grep -v 't\.Fatal'

Example output:

$ POD_NAMESPACE=venafi ./preflight agent --one-shot --output-path /dev/null --api-token should-not-be-required  --agent-config-file examples/cert-manager-agent.yaml -v 1
I1105 13:28:59.806101   86250 logs.go:158] "Preflight agent version: development ()" source="agent"
I1105 13:28:59.806320   86250 logs.go:158] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
I1105 13:28:59.806344   86250 logs.go:158] "Using deprecated Endpoint configuration. User Server instead." source="agent"
I1105 13:28:59.806362   86250 run.go:117] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
I1105 13:28:59.806389   86250 run.go:121] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
I1105 13:28:59.806464   86250 run.go:447] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
E1105 13:28:59.807120   86250 logs.go:156] "error messages will not show in the pod's events because the POD_NAME environment variable is empty" source="agent"
I1105 13:28:59.807717   86250 logs.go:158] "starting \"k8s/secrets.v1\" datagatherer" source="agent"
I1105 13:28:59.807854   86250 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
I1105 13:28:59.807888   86250 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
I1105 13:29:00.008575   86250 logs.go:158] "successfully gathered 18 items from \"k8s/secrets.v1\" datagatherer" source="agent"
I1105 13:29:00.009067   86250 logs.go:158] "Data saved to local file: /dev/null" source="agent"
I1105 13:29:00.009147   86250 run.go:461] "Shutting down" logger="APIServer.ListenAndServe" addr=":8081"
I1105 13:29:00.009205   86250 run.go:476] "Shutdown complete" logger="APIServer.ListenAndServe" addr=":8081"

$ echo $?
0
$ ./preflight agent --foo
E1105 13:30:18.198379   86557 root.go:51] "Exiting due to error" err="unknown flag: --foo" exit-code=1

$ echo $?
1
$ ./preflight agent
I1105 13:30:52.936075   86581 logs.go:153] "Preflight agent version: development ()" source="agent"
E1105 13:30:52.936271   86581 root.go:51] "Exiting due to error" err=<
        While evaluating configuration: no auth mode specified. You can use one of four auth modes:
         - Use (--venafi-cloud with --credentials-file) or (--client-id with --private-key-path) to use the Venafi Cloud Key Pair Service Account mode.
         - Use --venafi-connection for the Venafi Cloud VenafiConnection mode.
         - Use --credentials-file alone if you want to use the Jetstack Secure OAuth mode.
         - Use --api-token if you want to use the Jetstack Secure API Token mode.
 > exit-code=1

$ echo $?
1
$ ./preflight agent --one-shot --output-path /foo/bar --api-token should-not-be-required  --agent-config-file examples/cert-manager-agent.yaml
I1105 13:31:38.686969   86604 logs.go:153] "Preflight agent version: development ()" source="agent"
I1105 13:31:38.687764   86604 logs.go:153] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
I1105 13:31:38.687789   86604 logs.go:153] "Using deprecated Endpoint configuration. User Server instead." source="agent"
E1105 13:31:38.687815   86604 root.go:51] "Exiting due to error" err=<
        While evaluating configuration: 1 error occurred:
                * could not guess which namespace the agent is running in: POD_NAMESPACE env var not set, meaning that you are probably not running in cluster. Please use --install-namespace or POD_NAMESPACE to specify the namespace in which the agent is running.

 > exit-code=1

$ echo $?
1

// Usage information is still available on stdout with the `-h` and `--help`
// flags.
SilenceErrors: true,
SilenceUsage: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @maelvls in #593 (comment)

@wallrj wallrj changed the title WIP: [VC-35738] Use cobra RunE instead of log.Fatal, to exit the process and print an error WIP: [VC-35738] Bubble up the errors from sub-commands and handle errors centrally instead of using log.Fatal at multiple sites Oct 30, 2024
@wallrj wallrj changed the base branch from master to VC-35738/use-errgroup October 31, 2024 09:15
@wallrj wallrj force-pushed the VC-35738/remove-log-fatal branch from 4438c98 to 817f07c Compare October 31, 2024 09:23
@wallrj wallrj mentioned this pull request Oct 31, 2024
12 tasks
@wallrj wallrj force-pushed the VC-35738/use-errgroup branch from 54dc42e to 864edd3 Compare October 31, 2024 09:40
@wallrj wallrj force-pushed the VC-35738/remove-log-fatal branch 3 times, most recently from 35d6638 to 23579e7 Compare October 31, 2024 11:36
Base automatically changed from VC-35738/use-errgroup to VC-35738/feature November 1, 2024 12:06
@wallrj wallrj force-pushed the VC-35738/remove-log-fatal branch from 23579e7 to 708b1d7 Compare November 1, 2024 12:10
@wallrj wallrj changed the title WIP: [VC-35738] Bubble up the errors from sub-commands and handle errors centrally instead of using log.Fatal at multiple sites [VC-35738] Bubble up the errors from sub-commands and handle errors centrally instead of using log.Fatal at multiple sites Nov 1, 2024
@wallrj wallrj force-pushed the VC-35738/remove-log-fatal branch from b4ee1ac to 704847d Compare November 1, 2024 14:47
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
…up the stderr output

Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
@wallrj wallrj force-pushed the VC-35738/remove-log-fatal branch from 704847d to 045142c Compare November 5, 2024 13:24
@wallrj wallrj requested a review from maelvls November 5, 2024 13:32
Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all good to me. I haven't tested this manually, but I trust the output that you gave.

@wallrj wallrj merged commit 61d64e4 into VC-35738/feature Nov 5, 2024
2 checks passed
@wallrj wallrj deleted the VC-35738/remove-log-fatal branch November 5, 2024 16:07
err = multierror.Append(
err,
fmt.Errorf("failed to wait for controller-runtime component to stop: %v", groupErr),
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused a bug.
The group error is discarded, because I was assigning to the err variable assuming that it was the returned error, but it's not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants