Remove broken panic handler #17354

dbussink · 2024-12-09T15:19:23Z

This handler would recover any arbitrary panic and make the test pass. This is a big anti-pattern since it means we can have many failing tests we don't know about and things are silently broken potentially.

Running against CI to test the fallout of this.

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

vitess-bot · 2024-12-09T15:19:28Z

codecov · 2024-12-09T15:42:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.46%. Comparing base (0fe256e) to head (3e175b7).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #17354   +/-   ##
=======================================
  Coverage   67.46%   67.46%           
=======================================
  Files        1581     1581           
  Lines      253934   253934           
=======================================
+ Hits       171308   171313    +5     
+ Misses      82626    82621    -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dbussink · 2024-12-09T15:57:47Z

go/test/endtoend/recovery/pitr/shardedpitr_test.go

@@ -525,7 +524,6 @@ func launchRecoveryTablet(t *testing.T, tablet *cluster.Vttablet, binlogServer *
 	tablet.MysqlctlProcess = *mysqlctlProcess
 	extraArgs := []string{"--db-credentials-file", dbCredentialFile}
 	tablet.MysqlctlProcess.InitDBFile = initDBFileWithPassword
-	tablet.VttabletProcess.DbPassword = mysqlPassword


@shlomi-noach @mattlord @rohit-nayak-ps This line was crashing this test, so we never actually ran it and it was always passing because of the now removed defer cluster.PanicHandler(nil) call. Looks like the test might now be failing, but it means it has likely been failing forever / for a really long time.

Any ideas on how to fix this?

It's failing here:

func testTabletRecovery(t *testing.T, binlogServer *binLogServer, lookupTimeout, restoreKeyspaceName, shardName, expectedRows string) { recoveryTablet := clusterInstance.NewVttabletInstance("replica", 0, cell) launchRecoveryTablet(t, recoveryTablet, binlogServer, lookupTimeout, restoreKeyspaceName, shardName) sqlRes, err := recoveryTablet.VttabletProcess.QueryTablet(getCountID, keyspaceName, true) require.NoError(t, err) assert.Equal(t, expectedRows, sqlRes.Rows[0][0].String())

So my first guess would be that we need to wait for replication to catch up and for the expected result. Currently all it's doing is waiting for the tablet to become SERVING and then it queries it. Perhaps the tablet needs a second to replicate things. I would first just try adding a sleep of 30 seconds or something.

This tests the old-and-unsupported-and-actually-legacy Google Ripple binlog server based PITR.

RFC: Backup/restore: remove legacy binlog-server based PITR code #16673

https://vitess.io/docs/22.0/reference/features/recovery/#point-in-time-recovery-legacy-functionality-based-on-binlog-server

We should stop running this test altogether. Now is as good a time as ever.

@dbussink I've pushed a change to remove the test from running in shard 10. For now this should fix this PR. Later on (on a different PR) I will purge the entire codebase and tests.

frouioui

Aren't the tests supposed to fail anyway if there is a panic? Seems like cluster.PanicHandler should make the test fail if we recover a panic since the err wouldn't be nil:

vitess/go/test/endtoend/cluster/cluster_util.go

Lines 130 to 136 in 9b71606

    
           func PanicHandler(t testing.TB) { 
        
           	err := recover() 
        
           	if t == nil { 
        
           		return 
        
           	} 
        
           	require.Nilf(t, err, "panic occured in testcase %v", t.Name()) 
        
           }

dbussink · 2024-12-09T20:08:20Z

Aren't the tests supposed to fail anyway if there is a panic? Seems like cluster.PanicHandler should make the test fail if we recover a panic since the err wouldn't be nil:

It doesn't. In all the cases removed here, nil is passed in for t. So that means it does the recover() and then returns without failing the test.

So hence the test passes when this is used if the test panics.

This handler would recover any arbitrary panic and make the test pass. This is a big anti-pattern since it means we can have many failing tests we don't know about and things are silently broken potentially. Running against CI to test the fallout of this. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

This test was never running since any panic would make a test pass before removing the recovery handler. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

vitessio#16673 Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

When a test panics, it's way more useful to see the actual backtrace for the panic and not try to recover anything that hides that information. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink · 2024-12-10T09:47:53Z

Backporting this too since it can be hiding legitimate test failures / problems and we don't want to have those crop up in release branches either then.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Signed-off-by: Harshit Gangal <harshit@planetscale.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Harshit Gangal <harshit@planetscale.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com> Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

dbussink added Type: Testing Component: General Changes throughout the code base labels Dec 9, 2024

dbussink requested review from harshit-gangal, systay, frouioui, GuptaManan100, shlomi-noach, mattlord, rohit-nayak-ps, derekperkins and deepthi as code owners December 9, 2024 15:19

github-actions bot added this to the v22.0.0 milestone Dec 9, 2024

dbussink commented Dec 9, 2024

View reviewed changes

frouioui reviewed Dec 9, 2024

View reviewed changes

frouioui approved these changes Dec 9, 2024

View reviewed changes

dbussink requested a review from timvaillancourt as a code owner December 10, 2024 09:17

dbussink and others added 3 commits December 10, 2024 10:26

Fix nil pointer error in test

ab36413

This test was never running since any panic would make a test pass before removing the recovery handler. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

stop testing 'pitr', which tests a legacy binlog-server based PITR. See

ef77118

vitessio#16673 Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

Remove custom panic handler entirely

3e175b7

When a test panics, it's way more useful to see the actual backtrace for the panic and not try to recover anything that hides that information. Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

dbussink force-pushed the dbussink/remove-broken-panic-handler branch from 826c780 to 3e175b7 Compare December 10, 2024 09:27

dbussink added Backport to: release-19.0 Needs to be back ported to release-19.0 Backport to: release-20.0 Needs to be backport to release-20.0 Backport to: release-21.0 Needs to be backport to release-21.0 labels Dec 10, 2024

shlomi-noach approved these changes Dec 10, 2024

View reviewed changes

dbussink merged commit bad431d into vitessio:main Dec 10, 2024
104 checks passed

dbussink deleted the dbussink/remove-broken-panic-handler branch December 10, 2024 09:52

This was referenced Dec 10, 2024

[release-19.0] Remove broken panic handler (#17354) #17358

Merged

[release-20.0] Remove broken panic handler (#17354) #17359

Merged

[release-21.0] Remove broken panic handler (#17354) #17360

Merged

shlomi-noach mentioned this pull request Dec 10, 2024

Remove binlog-server point in time recoveries code & tests #17361

Open

5 tasks

shlomi-noach mentioned this pull request Dec 11, 2024

Fix mysql_server_vault CI test #17368

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove broken panic handler #17354

Remove broken panic handler #17354

dbussink commented Dec 9, 2024 •

edited

Loading

vitess-bot bot commented Dec 9, 2024

codecov bot commented Dec 9, 2024 •

edited

Loading

dbussink Dec 9, 2024

mattlord Dec 9, 2024

shlomi-noach Dec 10, 2024 •

edited

Loading

shlomi-noach Dec 10, 2024

frouioui left a comment

dbussink commented Dec 9, 2024

dbussink commented Dec 10, 2024

	func PanicHandler(t testing.TB) {
	err := recover()
	if t == nil {
	return
	}
	require.Nilf(t, err, "panic occured in testcase %v", t.Name())
	}

Remove broken panic handler #17354

Remove broken panic handler #17354

Conversation

dbussink commented Dec 9, 2024 • edited Loading

Checklist

vitess-bot bot commented Dec 9, 2024

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov bot commented Dec 9, 2024 • edited Loading

Codecov Report

dbussink Dec 9, 2024

Choose a reason for hiding this comment

mattlord Dec 9, 2024

Choose a reason for hiding this comment

shlomi-noach Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

shlomi-noach Dec 10, 2024

Choose a reason for hiding this comment

frouioui left a comment

Choose a reason for hiding this comment

dbussink commented Dec 9, 2024

dbussink commented Dec 10, 2024

dbussink commented Dec 9, 2024 •

edited

Loading

codecov bot commented Dec 9, 2024 •

edited

Loading

shlomi-noach Dec 10, 2024 •

edited

Loading