Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnionScan is extremely inefficient on reading #32433

Closed
tiancaiamao opened this issue Feb 17, 2022 · 6 comments · Fixed by #32441
Closed

UnionScan is extremely inefficient on reading #32433

tiancaiamao opened this issue Feb 17, 2022 · 6 comments · Fixed by #32441
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Feb 17, 2022

Enhancement

Scan 8000+ rows take 20ms!

I first found this issue when testing cached table #25293.

Without cache, less than 10ms:

mysql> SELECT   `process_code`,   `sweep_contract`,   `can_sweep`,   `operator_no`,   `area_code`,   DATE_FORMAT(`busi_date`, '%Y-%m-%d') FROM   pm_acc_trans_sweep WHERE   `process_code` = 'GDEP0071';
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
| process_code | sweep_contract | can_sweep | operator_no | area_code | DATE_FORMAT(`busi_date`, '%Y-%m-%d') |
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
| GDEP0071     | 0009           | 0         | 000000      | 7518      | 2014-05-08                           |
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
1 row in set (0.00 sec)

With cache, it takes 20ms:

mysql> SELECT   `process_code`,   `sweep_contract`,   `can_sweep`,   `operator_no`,   `area_code`,   DATE_FORMAT(`busi_date`, '%Y-%m-%d') FROM   pm_acc_trans_sweep WHERE   `process_code` = 'GDEP0071';
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
| process_code | sweep_contract | can_sweep | operator_no | area_code | DATE_FORMAT(`busi_date`, '%Y-%m-%d') |
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
| GDEP0071     | 0009           | 0         | 000000      | 7518      | 2014-05-08                           |
+--------------+----------------+-----------+-------------+-----------+--------------------------------------+
1 row in set (0.02 sec)

The data rows is 8000:

mysql> SELECT  count(*) from pm_acc_trans_sweep;
+----------+
| count(*) |
+----------+
|     8912 |
+----------+
1 row in set (0.00 sec)

It's quite counterintuitive that reading data from cache in TiDB slower than get the data from TiKV.
So I dig into it, and find the reason.

UnionScan is extremely inefficient when reading data from membuffer!

func BenchmarkUnionScanRead(b *testing.B) {
	store, clean := testkit.CreateMockStore(b)
	defer clean()

	tk := testkit.NewTestKit(b, store)
	tk.MustExec("use test")
	tk.MustExec(`create table t_us (
c1 varchar(10),
c2 varchar(30),
c3 varchar(1),
c4 varchar(12),
c5 varchar(10),
c6 datetime);`)
	tk.MustExec(`begin;`)
	for i:=0; i<8000; i++ {
		tk.MustExec("insert into t_us values ('54321', '1234', '1', '000000', '7518', '2014-05-08')")
	}

	b.ReportAllocs()
	b.ResetTimer()
	for i:=0; i<b.N; i++ {
		tk.MustQuery("select * from t_us where c1 = '12345'").Check(testkit.Rows())
	}
	b.StopTimer()
}
@tiancaiamao tiancaiamao added the type/enhancement The issue or PR belongs to an enhancement. label Feb 17, 2022
@tiancaiamao tiancaiamao changed the title UnionScan is extremely inefficient UnionScan is extremely inefficient on reading Feb 17, 2022
@tiancaiamao
Copy link
Contributor Author

tiancaiamao commented Feb 17, 2022

go test -run TestXXX -bench BenchmarkUnionScanRead -cpuprofile cpu.out -memprofile mem.out -benchtime 30s

    2301          15272748 ns/op         6959969 B/op     129143 allocs/op

On my PC (AMD 4750G), you can see it takes 15.2ms for each operation...

@tiancaiamao
Copy link
Contributor Author

image

Too much allocations slow down the operation.

@ekexium
Copy link
Contributor

ekexium commented Mar 10, 2022

The time consumed in open should be logged somewhere. Currently there is barely no way to find out union scan consumed so much time.

@tiancaiamao
Copy link
Contributor Author

The time consumed in open should be logged somewhere. Currently there is barely no way to find out union scan consumed so much time.

@ekexium Are u using the latest master? #32631 added it to trace already.

@ekexium
Copy link
Contributor

ekexium commented Mar 11, 2022

Never mind, I used 5.3.0.

@tiancaiamao
Copy link
Contributor Author

tiancaiamao commented Apr 8, 2022

It's slightly better after #32668
Close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants