-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/log: make test report their stderr in the test output #51410
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, that was fast.
For my indulgement, could you add the "pseudo test" as a separate commit so we can see it fail on CI? I'm mildly worried that TC will bury it even deeper now that it's not in a file. I would love to be proven wrong in which case we could simply proceed.
Added the dummy test. |
Previously, a `log.Scope()` would capture logs from tests in the same way as a running server; that is, by also capturing direct writes to stderr (such as those performed by the Go runtime for uncaught panics, or internal errors) to a separate `cockroach-stderr.log`. This behavior, while desirable for running servers, is problematic for unit testing: in that case, we prefer that uncaught panics and possible other Go runtime errors are interleaved in the test output, so as to automate the triage of test failures. This patch improves the situation by carving an exception to the default logging behavior for the purpose of `log.Scope()`. With this change, direct writes to stderr by code ran under a test will go to the test output, regardless of the logging configuration. The few tests that are truly interested in checking that stderr writes go to a log file can use the new constructor `log.ScopeWithoutShowLogsAndForcedStderrCapture()`. For example, consider this pseudo-test: ```go func TestFoo(t *testing.T) { defer log.Scope(t).Close(t) t.Log("i'm t.Logging") fmt.Println("i'm on stdout") fmt.Fprintln(os.Stderr, "i'm on stderr") go func() { panic("i panicked somewhere else") }() time.Sleep(time.Second) } ``` Before this patch, the output is like: ``` === RUN TestFoo TestFoo: test_log_scope.go:77: test logs captured to: /var/folders/yy/4q8rrssd27vdgbr59w9qbffr0000gn/T/logTestFoo927366034 TestFoo: test_log_scope.go:58: use -show-logs to present logs inline TestFoo: stopper_test.go:743: i'm t.Logging i'm on stdout FAIL github.com/cockroachdb/cockroach/pkg/util/stop 1.624s FAIL ``` And the stderr writes + panic are going to a separate file. With this patch, we get the following output instead: ``` === RUN TestFoo TestFoo: test_log_scope.go:88: test logs captured to: /tmp/tmp.v9jl56br/logTestFoo207407502 TestFoo: test_log_scope.go:58: use -show-logs to present logs inline TestFoo: clog_test.go:721: i'm t.Logging i'm on stdout i'm on stderr panic: i panicked somewhere else goroutine 69 [running]: github.com/cockroachdb/cockroach/pkg/util/log.TestFoo.func1() /data/home/kena/src/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog_test.go:725 +0x39 created by github.com/cockroachdb/cockroach/pkg/util/log.TestFoo /data/home/kena/src/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog_test.go:724 +0x1db FAIL github.com/cockroachdb/cockroach/pkg/util/log 0.047s FAIL ``` Alternatives that were considered: - have the deferred `Close()` method of the TestLogScope pull out the contents of `cockroach-stderr.log` and manually duplicate them on the test output. This solution was rejected because when a go runtime error occurs on a separate goroutine than the one where the TestLogScope is defined, the deferred don't get a chance to run. - use a "tee writer" for the stderr stream. There are two problems with this: - the Go runtime writes to file descriptor 2 directly, so it's not sufficient to override `os.Stderr` with a tee writer. - teeing writes to fd 2 would require spawning a goroutine that captures the writes and duplicates them, asynchronously. The problem here is that the go runtime promptly terminates the process after it completes writing to fd 2. An async tee goroutine would not get a chance to run and the fd 2 output would get lost. Release note: None
@tbg this PR is revealing 3 separate problems which you may want to address separately:
My recommendation at this point is twofold:
|
@tbg in light of my last comment, I'm also adding another non-panicy dummy test to demonstrate how the stderr output is reported in the build log. |
Release note (<category, see below>): <what> <show> <why>
Look at this beauty: Given this works, we now have the following:
I'm mildly in favor of keeping (and merging the PR) if there's a way to make the test runner pick up the pieces when there's a package-level failure and tell us just a bit about where to look. |
There's another thing we can do btw:
What do you think? |
(I'm particularly anxious about the stderr not being sent to artifacts in the case where the go test runner is obnoxious and also hides the test output from us, which unfortunately happens) |
(It happens because |
My take-away from test-infra rotation is that we should try to be less clever in general. We have a hodgepodge of cleverness that turns the CI pipeline into hard-to-maintain magic. I basically want things to be simple and smooth for "normal" test failures using only the TC built-in Go tooling, and for "abnormal" once (like off-goroutine panics) I want to make sure it's easy enough to find the failure cause via an artifact or the build log. Unfortunately, TeamCity isn't exactly... sane here when we use JSON output (as we do and I think continue to want to do) as it tries really hard to not show us that JSON (which is why we have the Let's talk about maybe_stress separately. I agree it's not good. But one at a time. You can "just" comment it out in your intentionally failing commit.
See above - my preference is adherence to the principle of least magic. It's tempting to want to fill in the missing pieces but so far that has only been done through one-off initiatives and ownership then sticks to the author. |
So you and I agree we should not focus too much on the case of "off-goroutine panics". In that case, why do you so much insist on having the stderr go to the test log directly? If the log.Scope's Close method is going to run in the common case, why not have that pull out the stderr output and report it? |
I looked into |
Right, I don't want to build too much magic to make them extra-pleasant to work with, but they do happen and need to be actionable. Which I think would not be the case now, since the output will be in "some" artifact but it's hard to document which one. I think it is worthwhile to have one file that has everything as this is how one debugs "locally", and since the TC build log is so unpleasant to deal with this file needs to be an artifact. At the same time I'm also averse to magic for the common case, hence I'm not keen on letting the log scope pull conditional tricks on the artifacts. |
This PR is worse when a test fails in the In that case, the details of the test failure are hidden by Go, and they are also lost from the artifacts directory. That's strictly worse than the other PR #51499. To make it better, we need to remove the |
Ah, the stress problem. Remind me again - |
Sounds possible. (I don't fully understand the problem - I just see that the stress prereq fails, we have zero evidence left, and the actual test target after that never runs)
I'll run that now. |
Next time you see that, go to the build log, switch to |
closing in favor of #52200 |
52200: util/log: improvements, simplifications and bug fixes r=irfansharif a=knz Fixes #52026 Fixes #52176 Fixes #52128 Fixes #51332 Supersedes #51499, #51410 For context, I was working on #51990, #51499 and #51410 when I stumbled upon bug #52128 in the logging package. I decided to clean up the code and fix that bug, and I ended up removing a lot of complexity and add many missing comments. Example CI output: ![image](https://user-images.githubusercontent.com/642886/89288431-efaedc80-d655-11ea-97d9-b096b580d951.png) By removing the complexity, I removed the problem otherwise solved by #51499 & #51410 and I removed the source of the flake causing #52026 and #52176. Salient bits in this PR: - **util/log: reinforce the behavior of log file rotation** - "This patch "does the right thing" by opening the new file before switching over whatever redirection may have been set up from the old file to the new one." - **util/log: simplify TestLogScope+don't capture internal stderr writes** - "This complexity was thus removed and the doc clarified that TestLogScope is only valid under the default configuration, when there was no file output configured yet and no secondary loggers active yet." - "However we also have other reasons to dislike the stderr redirect in tests, so I decided to remove it here instead of conjuring new code." - **pgwire: un-skip TestauthenticationAndHBARules** Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Fixes #51409.
Previously, a
log.Scope()
would capture logs from tests in the sameway as a running server; that is, by also capturing direct writes to
stderr (such as those performed by the Go runtime for uncaught panics,
or internal errors) to a separate
cockroach-stderr.log
.This behavior, while desirable for running servers, is problematic for
unit testing: in that case, we prefer that uncaught panics and
possible other Go runtime errors are interleaved in the test output,
so as to automate the triage of test failures.
This patch improves the situation by carving an exception to the
default logging behavior for the purpose of
log.Scope()
. With thischange, direct writes to stderr by code ran under a test will go to
the test output, regardless of the logging configuration.
The few tests that are truly interested in checking that stderr writes
go to a log file can use the new constructor
log.ScopeWithoutShowLogsAndForcedStderrCapture()
.For example, consider this pseudo-test:
Before this patch, the output is like:
And the stderr writes + panic are going to a separate file.
With this patch, we get the following output instead:
Alternatives that were considered:
have the deferred
Close()
method of the TestLogScope pull out thecontents of
cockroach-stderr.log
and manually duplicate them on thetest output.
This solution was rejected because when a go runtime error occurs on
a separate goroutine than the one where the TestLogScope is defined,
the deferred don't get a chance to run.
use a "tee writer" for the stderr stream. There are two problems
with this:
the Go runtime writes to file descriptor 2 directly, so it's not
sufficient to override
os.Stderr
with a tee writer.teeing writes to fd 2 would require spawning a goroutine that
captures the writes and duplicates them, asynchronously.
The problem here is that the go runtime promptly terminates
the process after it completes writing to fd 2. An async
tee goroutine would not get a chance to run and the fd 2
output would get lost.
Release note: None