Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: use exit-on-error:false for crdb log cfg #63472

Merged
merged 1 commit into from
Apr 14, 2021

Conversation

tbg
Copy link
Member

@tbg tbg commented Apr 12, 2021

See #62763.

We seem to frequently miss the runtime errors resulting from
out-of-memory conditions in the stderr logs. We don't understand
exactly why yet, but it is very likely that with exit-on-error
(which is true by default) we are hitting errors outputting to
the sink which then kill the process before the runtime errors
bubble up.

While we develop a proper fix, avoid the problematic configuration
on roachprod clusters, which notably includes roachtests.

Release note: None

@tbg tbg requested review from ajwerner and a team April 12, 2021 13:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

@tbg
Copy link
Member Author

tbg commented Apr 12, 2021

bors r=ajwerner
TFTR!

@craig
Copy link
Contributor

craig bot commented Apr 12, 2021

Build failed:

@knz
Copy link
Contributor

knz commented Apr 12, 2021

NB: this change will cause nodes to continue running even when they cannot write to the log files on disk, and we'll become blind as to what is happening.

(Not pushing against the change, but this should be clear to everyone involved)

@tbg
Copy link
Member Author

tbg commented Apr 14, 2021

NB: this change will cause nodes to continue running even when they cannot write to the log files on disk, and we'll become blind as to what is happening.

That is fine. We only log to files in roachtest, and when we fail to log to files, likely we can't write to the file system, so there is really nowhere for the messages to go and we're not materially worsening the situation.

Unfortunately to get this PR in I need to somehow sniff out the old version in the mixed version upgrade roachtest. Sigh, I'll cook something up.

See cockroachdb#62763.

We seem to frequently miss the runtime errors resulting from
out-of-memory conditions in the stderr logs. We don't understand
exactly why yet, but it is very likely that with `exit-on-error`
(which is true by default) we are hitting errors outputting to
the sink which then kill the process before the runtime errors
bubble up.

While we develop a proper fix, avoid the problematic configuration
on roachprod clusters, which notably includes roachtests.

v20.2 did not have the `--log` flag yet, so we only do this when
starting v21.1 (i.e. `master` at time of writing).

Release note: None
@tbg
Copy link
Member Author

tbg commented Apr 14, 2021

Unfortunately to get this PR in I need to somehow sniff out the old version in the mixed version upgrade roachtest. Sigh, I'll cook something up.

This ended up being straightforward.

bors r=ajwerner

@craig
Copy link
Contributor

craig bot commented Apr 14, 2021

Canceled.

@tbg
Copy link
Member Author

tbg commented Apr 14, 2021

bors r=ajwerner

@craig
Copy link
Contributor

craig bot commented Apr 14, 2021

Build succeeded:

@craig craig bot merged commit b2a6c7d into cockroachdb:master Apr 14, 2021
tbg added a commit to tbg/cockroach that referenced this pull request Apr 15, 2021
I broke this in cockroachdb#63472, twofold.

1. if the version switch returned false, we weren't passing log flags
   at all, so there weren't any log files.
2. the version switch returned false on current master, since 21.1 is
   not tagged yet (we're still in the prerelease versions, currently
   at 21.1.0-alpha.3).

I tested that this fix works.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 15, 2021
63342: ui: add button to reset SQL stats r=Azhng a=Azhng

Release note (ui change): User can now reset SQL stats from DB Console.

Follow up to: cockroachdb/ui#271
Closes #33315 

63401: ui: e2e tests, improved cypress test for tsx, overview and db page r=nanduu04 a=nanduu04

Previously, only visual regression tests were written

Wrote cypress tests using the cypress-testing-library

[CRDB-2388](https://cockroachlabs.atlassian.net/browse/CRDB-2388?atlOrigin=eyJpIjoiOWIzNzY0ZjkzY2RhNGU3ZTgyZTUyOTY5MjE4ZmM5YmIiLCJwIjoiaiJ9)

Release note: None

63713: config: deflake TestMarshalableZoneConfigRoundTrip r=aayushshah15 a=aayushshah15

A previous change (#63079) made this test flakey by (needlessly) making
one of the marshalled fields a value derived from two other fields
(as opposed to just one). This commit fixes the flake.

Release note: None

63726: install: fix log flags in roachprod start r=knz a=tbg

I broke this in #63472, twofold.

1. if the version switch returned false, we weren't passing log flags
   at all, so there weren't any log files.
2. the version switch returned false on current master, since 21.1 is
   not tagged yet (we're still in the prerelease versions, currently
   at 21.1.0-alpha.3).

I tested that this fix works.

Release note: None


Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Nandu Pokhrel <nandup@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants