Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Append errgroup errors to the error returned by Agent.Run #606

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Nov 6, 2024

In #599 (comment) I caused a bug where the error from the errgroup Go routines is discarded rather than being logged and causing the process to exit.

How did I notice this bug? I noticed the problem while testing a revision of #603 where I wrapped nil DataGatherer.Run early return into an error...but was not seeing the resulting error. It turns out that is the wrong behaviour and I found an additional bug #605 so I have since reverted that change.

I didn't write automated test for this because I feel like we will be refactoring and simplifying the agent Run code in future and I just want to get the logging sorted out.
But I did test this manually as follows:

  • Start an HTTP server on port 8081
python3 -m http.server 8081
  • Run the (updated in this PR) test and observe it succeed (unexpectedly)

You can see a log message about the API server skipping the shutdown sequence because the server has already stopped due to port 8081 is already being occupied,
but the error from the errgroup Go routine is being lost.

$ go test ./pkg/agent/... --run TestRunOneShot -v  -count 1
=== RUN   TestRunOneShot
    run_test.go:54: Running child process
    run_test.go:72: STDOUT
        I1106 16:32:24.699021  193501 logs.go:153] "Preflight agent version: development ()" source="agent"
        I1106 16:32:24.699143  193501 logs.go:153] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
        I1106 16:32:24.699158  193501 run.go:119] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
        I1106 16:32:24.699181  193501 run.go:123] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
        I1106 16:32:24.699253  193501 run.go:457] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
        I1106 16:32:24.699817  193501 logs.go:153] "Reading data from local file: testdata/one-shot/success/input.json" source="agent"
        I1106 16:32:24.699849  193501 run.go:467] "Shutdown skipped" logger="APIServer.ListenAndServe" addr=":8081" reason="Server already stopped"
        I1106 16:32:24.699875  193501 logs.go:153] "Data saved to local file: /dev/null" source="agent"
        PASS

    run_test.go:73: STDERR

--- PASS: TestRunOneShot (0.02s)
PASS
ok      github.com/jetstack/preflight/pkg/agent 0.042s

This time the test fails because the errgroup error has been properly bubbled up to the main function.

$ go test ./pkg/agent/... --run TestRunOneShot -v  -count 1
=== RUN   TestRunOneShot
    run_test.go:54: Running child process
    run_test.go:72: STDOUT
        I1106 16:32:39.852295  193936 logs.go:153] "Preflight agent version: development ()" source="agent"
        I1106 16:32:39.852439  193936 logs.go:153] "Using the Jetstack Secure API Token auth mode since --api-token was specified." source="agent"
        I1106 16:32:39.852468  193936 run.go:119] "Healthz endpoints enabled" logger="APIServer" addr=":8081" path="/healthz"
        I1106 16:32:39.852489  193936 run.go:123] "Readyz endpoints enabled" logger="APIServer" addr=":8081" path="/readyz"
        I1106 16:32:39.852561  193936 run.go:457] "Starting" logger="APIServer.ListenAndServe" addr=":8081"
        I1106 16:32:39.852863  193936 run.go:467] "Shutdown skipped" logger="APIServer.ListenAndServe" addr=":8081" reason="Server already stopped"
        I1106 16:32:39.853174  193936 logs.go:153] "Reading data from local file: testdata/one-shot/success/input.json" source="agent"
        I1106 16:32:39.853233  193936 logs.go:153] "Data saved to local file: /dev/null" source="agent"
        --- FAIL: TestRunOneShot (0.00s)
            run_test.go:50:
                        Error Trace:    /home/richard/projects/venafi/venafi-kubernetes-agent/pkg/agent/run_test.go:50
                        Error:          Received unexpected error:
                                        1 error occurred:
                                                * failed to wait for controller-runtime component to stop: APIServer: ListenAndServe: listen tcp :8081: bind: address already in use

                        Test:           TestRunOneShot
                        Messages:       Run returned an unexpected error
        FAIL

    run_test.go:73: STDERR

    run_test.go:74:
                Error Trace:    /home/richard/projects/venafi/venafi-kubernetes-agent/pkg/agent/run_test.go:74
                Error:          Received unexpected error:
                                exit status 1
                Test:           TestRunOneShot
                Messages:       <nil>
--- FAIL: TestRunOneShot (0.02s)
FAIL
FAIL    github.com/jetstack/preflight/pkg/agent 0.042s
FAIL

Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
@wallrj wallrj requested a review from maelvls November 6, 2024 16:42
@wallrj wallrj changed the title [VC-35738] Append errgroup errors to Run return error [VC-35738] Append errgroup errors to the error returned by Agent.Run Nov 6, 2024
Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted! Thank you (al lot) for the well-written PR description. Always a delight.

I didn't write automated test for this because I feel like we will be refactoring and simplifying the agent Run code in future and I just want to get the logging sorted out.

Got it. I'm OK with the idea of deferring the additional automated tests to later.

@wallrj wallrj merged commit 4ce96c0 into VC-35738/feature Nov 7, 2024
2 checks passed
@wallrj wallrj deleted the VC-35738/errgroup-return-error-2 branch November 7, 2024 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants