-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: use exit-on-error:false for crdb log cfg #63472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)
bors r=ajwerner |
Build failed: |
NB: this change will cause nodes to continue running even when they cannot write to the log files on disk, and we'll become blind as to what is happening. (Not pushing against the change, but this should be clear to everyone involved) |
That is fine. We only log to files in roachtest, and when we fail to log to files, likely we can't write to the file system, so there is really nowhere for the messages to go and we're not materially worsening the situation. Unfortunately to get this PR in I need to somehow sniff out the old version in the mixed version upgrade roachtest. Sigh, I'll cook something up. |
See cockroachdb#62763. We seem to frequently miss the runtime errors resulting from out-of-memory conditions in the stderr logs. We don't understand exactly why yet, but it is very likely that with `exit-on-error` (which is true by default) we are hitting errors outputting to the sink which then kill the process before the runtime errors bubble up. While we develop a proper fix, avoid the problematic configuration on roachprod clusters, which notably includes roachtests. v20.2 did not have the `--log` flag yet, so we only do this when starting v21.1 (i.e. `master` at time of writing). Release note: None
This ended up being straightforward. bors r=ajwerner |
Canceled. |
bors r=ajwerner |
Build succeeded: |
I broke this in cockroachdb#63472, twofold. 1. if the version switch returned false, we weren't passing log flags at all, so there weren't any log files. 2. the version switch returned false on current master, since 21.1 is not tagged yet (we're still in the prerelease versions, currently at 21.1.0-alpha.3). I tested that this fix works. Release note: None
63342: ui: add button to reset SQL stats r=Azhng a=Azhng Release note (ui change): User can now reset SQL stats from DB Console. Follow up to: cockroachdb/ui#271 Closes #33315 63401: ui: e2e tests, improved cypress test for tsx, overview and db page r=nanduu04 a=nanduu04 Previously, only visual regression tests were written Wrote cypress tests using the cypress-testing-library [CRDB-2388](https://cockroachlabs.atlassian.net/browse/CRDB-2388?atlOrigin=eyJpIjoiOWIzNzY0ZjkzY2RhNGU3ZTgyZTUyOTY5MjE4ZmM5YmIiLCJwIjoiaiJ9) Release note: None 63713: config: deflake TestMarshalableZoneConfigRoundTrip r=aayushshah15 a=aayushshah15 A previous change (#63079) made this test flakey by (needlessly) making one of the marshalled fields a value derived from two other fields (as opposed to just one). This commit fixes the flake. Release note: None 63726: install: fix log flags in roachprod start r=knz a=tbg I broke this in #63472, twofold. 1. if the version switch returned false, we weren't passing log flags at all, so there weren't any log files. 2. the version switch returned false on current master, since 21.1 is not tagged yet (we're still in the prerelease versions, currently at 21.1.0-alpha.3). I tested that this fix works. Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Nandu Pokhrel <nandup@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
See #62763.
We seem to frequently miss the runtime errors resulting from
out-of-memory conditions in the stderr logs. We don't understand
exactly why yet, but it is very likely that with
exit-on-error
(which is true by default) we are hitting errors outputting to
the sink which then kill the process before the runtime errors
bubble up.
While we develop a proper fix, avoid the problematic configuration
on roachprod clusters, which notably includes roachtests.
Release note: None