Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating TPCC perf #6982

Closed
max-hoffman opened this issue Nov 9, 2023 · 1 comment
Closed

Investigating TPCC perf #6982

max-hoffman opened this issue Nov 9, 2023 · 1 comment
Labels

Comments

@max-hoffman
Copy link
Contributor

Some low hanging fruit

Slow creating new NOW function:
https://github.com/dolthub/go-mysql-server/blob/6744a0d54c8f4611177f5de0a64d29603139ef7f/sql/expression/function/time.go#L845

Slow executing NOW function (specifically time.ConvertTimeZone)
image

Comparison Eval could be faster:
image

The sorts in these queries are unnecessary:

sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer1.c_id]                                                                                                                                                                                                 |
|  └─ Sort(customer1.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer1.c_last = 'ESEEINGABLE')                                                                                                                                                                                |
|          └─ IndexedTableAccess(customer1)                                                                                                                                                                                     |
|              ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

sbt> explain SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer2.c_balance, customer2.c_first, customer2.c_middle, customer2.c_id]                                                                                                                                     |
|  └─ Sort(customer2.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer2.c_last = 'PRIESEPRES')                                                                                                                                                                                 |
|          └─ IndexedTableAccess(customer2)                                                                                                                                                                                     |
|              ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

sbt> explain SELECT o_id, o_carrier_id, o_entry_d FROM orders2 WHERE o_w_id = 1 AND o_d_id = 1 AND o_c_id = 355 ORDER BY o_id DESC;
+--------------------------------------------------------------------------------------------------+
| plan                                                                                             |
+--------------------------------------------------------------------------------------------------+
| Project                                                                                          |
|  ├─ columns: [orders2.o_id, orders2.o_carrier_id, orders2.o_entry_d]                             |
|  └─ Sort(orders2.o_id DESC)                                                                      |
|      └─ Filter                                                                                   |
|          ├─ (orders2.o_c_id = 355)                                                               |
|          └─ IndexedTableAccess(orders2)                                                          |
|              ├─ index: [orders2.o_w_id,orders2.o_d_id,orders2.o_id]                              |
|              ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}]                                           |
|              └─ columns: [o_id o_d_id o_w_id o_c_id o_entry_d o_carrier_id o_ol_cnt o_all_local] |
+--------------------------------------------------------------------------------------------------+

The blob type should be pruned, deserializing all of the columns is expensive:

sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer1.c_id]                                                                                                                                                                                                 |
|  └─ Sort(customer1.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer1.c_last = 'ESEEINGABLE')                                                                                                                                                                                |
|          └─ IndexedTableAccess(customer1)                                                                                                                                                                                     |
|              ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Same here:

		Query: `SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first`,
		ExpectedPlan: "Project\n" +
			" ├─ columns: [customer2.c_balance:16, customer2.c_first:3, customer2.c_middle:4, customer2.c_id:0!null]\n" +
			" └─ Sort(customer2.c_first:3 ASC nullsFirst)\n" +
			"     └─ Filter\n" +
			"         ├─ Eq\n" +
			"         │   ├─ customer2.c_last:5\n" +
			"         │   └─ PRIESEPRES (longtext)\n" +
			"         └─ IndexedTableAccess(customer2)\n" +
			"             ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]\n" +
			"             ├─ static: [{[1, 1], [1, 1], [NULL, ∞)}]\n" +
			"             └─ Table\n" +
			"                 ├─ name: customer2\n" +
			"                 └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
			"",

Is it possible to avoid round tripping the blob type for an update?

		Query: `UPDATE customer1 SET c_balance=-1777.000000, c_ytd_payment=1777.000000 WHERE c_w_id = 1 AND c_d_id=5 AND c_id=1838`,
		ExpectedPlan: "RowUpdateAccumulator\n" +
			" └─ Update\n" +
			"     └─ UpdateSource(SET customer1.c_balance:16 = -1777.000000,SET customer1.c_ytd_payment:17 = 1777 (decimal(10,6)))\n" +
			"         └─ IndexedTableAccess(customer1)\n" +
			"             ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]\n" +
			"             ├─ static: [{[1, 1], [5, 5], [1838, 1838]}]\n" +
			"             └─ Table\n" +
			"                 ├─ name: customer1\n" +
			"                 └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
			"",

Deeper bugs

TPCC doesn't benefit from the same caching as individual sysbench queries:

sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24                       |
+--------------------------+
1 row in set (1.20 sec)

sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24                       |
+--------------------------+
1 row in set (0.08 sec)

This query has a particular optimization where the DISTINCT can be transformed into an ORDERED DISTINCT on the order_line3 table (this one is is an edge case, maybe taking a 10ms query to 5ms at best):

sbt> explain SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+------------------------------------------------------------------------------------------------------------+
| plan                                                                                                       |
+------------------------------------------------------------------------------------------------------------+
| Project                                                                                                    |
|  ├─ columns: [countdistinct([stock3.s_i_id])]                                                              |
|  └─ GroupBy                                                                                                |
|      ├─ SelectedExprs(COUNTDISTINCT([stock3.s_i_id]))                                                      |
|      ├─ Grouping()                                                                                         |
|      └─ LookupJoin                                                                                         |
|          ├─ IndexedTableAccess(order_line3)                                                                |
|          │   ├─ index: [order_line3.ol_w_id,order_line3.ol_d_id,order_line3.ol_o_id,order_line3.ol_number] |
|          │   ├─ filters: [{[1, 1], [5, 5], [2983, 3003), [NULL, ∞)}]                                       |
|          │   └─ columns: [ol_o_id ol_d_id ol_w_id ol_i_id]                                                 |
|          └─ Filter                                                                                         |
|              ├─ ((stock3.s_w_id = 1) AND (stock3.s_quantity < 18))                                         |
|              └─ IndexedTableAccess(stock3)                                                                 |
|                  ├─ index: [stock3.s_w_id,stock3.s_i_id]                                                   |
|                  ├─ columns: [s_i_id s_w_id s_quantity]                                                    |
|                  └─ keys: 1, order_line3.ol_i_id                                                           |
+------------------------------------------------------------------------------------------------------------+

The sequence of events is:

  1. Infer DISTINCT(stock3.s_i_id) ~ DISTINCT(order_line3.ol_i_id)
  2. Push DISTINCT(order_line3.ol_i_id) into LHS of join.
  3. Notice that the ITA(order_line3) will return rows sorted by order_line3.ol_i_id, so the DISTINCT can be executed in O(1) memory.

doCommit is slow, optimistic lock failures scale super linearly to concurrency:
image

We scale poorly with multithread, --time=60 --threads={threads} --tables=4 --scale=1 --db-driver=mysql run on macbook pro:
TPCC scaling

We are still picking a slow HASH_JOIN when a LOOKUP_JOIN is nearly instant (costing will fix this):
image

@timsehn
Copy link
Contributor

timsehn commented Feb 28, 2024

We now publish results of the TPC-C benchmark in our docs:

https://docs.dolthub.com/sql-reference/benchmarks/latency

And include it in the overall multiple of MySQL we communicate to customers. We'll continue to iterate to get it under 3X. It is currently at 5.5X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants