-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
value change caused by DISTINCT #38756
Comments
minimal case: tidb> desc SELECT DISTINCT cast(1 as double) FROM t;
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8 | 0.00 | root | | group by:Column#5, funcs:firstrow(Column#5)->Column#3 |
| └─TableReader_9 | 0.00 | root | | data:HashAgg_4 |
| └─HashAgg_4 | 0.00 | cop[tikv] | | group by:1, |
| └─TableFullScan_7 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+---------------------------+---------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.00 sec) |
It seems we miss a root projection for this plan as the following sqls show: tidb> desc select cast(1 as double) from t group by cast(1 as double);
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| Projection_4 | 1.00 | root | | 1->Column#3 |
| └─HashAgg_9 | 0.00 | root | | group by:Column#7, funcs:firstrow(Column#7)->Column#6 |
| └─TableReader_10 | 0.00 | root | | data:HashAgg_5 |
| └─HashAgg_5 | 0.00 | cop[tikv] | | group by:1, |
| └─TableFullScan_8 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
5 rows in set (0.00 sec)
tidb> desc (SELECT 1 FROM t group by 1); -- sql2
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
| Projection_4 | 1.00 | root | | 1->Column#3 |
| └─HashAgg_9 | 0.00 | root | | group by:Column#7, funcs:firstrow(Column#7)->Column#6 |
| └─TableReader_10 | 0.00 | root | | data:HashAgg_5 |
| └─HashAgg_5 | 0.00 | cop[tikv] | | group by:1, |
| └─TableFullScan_8 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-----------------------------+---------+-----------+---------------+-------------------------------------------------------+
5 rows in set (0.00 sec)
tidb> desc (SELECT 1 FROM t group by c1); -- sql2
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
| Projection_4 | 2.40 | root | | 1->Column#3 |
| └─HashAgg_9 | 0.00 | root | | group by:test.t.c1, funcs:firstrow(Column#7)->Column#6 |
| └─TableReader_10 | 0.00 | root | | data:HashAgg_5 |
| └─HashAgg_5 | 0.00 | cop[tikv] | | group by:test.t.c1, funcs:firstrow(1)->Column#7 |
| └─TableFullScan_8 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+-----------------------------+---------+-----------+---------------+--------------------------------------------------------+
5 rows in set (0.00 sec) |
I think this may be a bug of the optimizer, I'll change the label to sig/planner. |
We also have the same issue when TiFlash is enabled:
|
Hope these can be helpful for your debugging: mysql> select version();
+--------------------+
| version() |
+--------------------+
| 5.7.25-TiDB-v5.1.0 |
+--------------------+
1 row in set (0.00 sec)
mysql> (SELECT SQRT(1) FROM t); -- sql1
+---------+
| SQRT(1) |
+---------+
| 1 |
| 1 |
| 1 |
+---------+
3 rows in set (0.00 sec)
mysql> (SELECT DISTINCT SQRT(1) FROM t); -- sql2
+---------+
| SQRT(1) |
+---------+
| 5e-324 |
+---------+
1 row in set (0.00 sec)
mysql> select version();
+--------------------+
| version() |
+--------------------+
| 5.7.25-TiDB-v5.0.5 |
+--------------------+
1 row in set (0.00 sec)
mysql> (SELECT SQRT(1) FROM t); -- sql1
+---------+
| SQRT(1) |
+---------+
| 1 |
| 1 |
| 1 |
+---------+
3 rows in set (0.00 sec)
mysql> (SELECT DISTINCT SQRT(1) FROM t); -- sql2
+---------+
| SQRT(1) |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec) |
/assign @hi-rustin |
I found that: If we analyze the table after we insert some data into the table immediately. Then we can get the correct result. |
Before analyze: mysql> explain (SELECT DISTINCT SQRT(1) FROM t);
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| id | estRows | task | access object | operator info |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
| HashAgg_8 | 1.00 | root | | group by:Column#5, funcs:firstrow(Column#5)->Column#3 |
| └─TableReader_9 | 1.00 | root | | data:HashAgg_4 |
| └─HashAgg_4 | 1.00 | cop[tikv] | | group by:1, |
| └─TableFullScan_7 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
+---------------------------+----------+-----------+---------------+-------------------------------------------------------+
4 rows in set (0.01 sec) logical p: DataScan(tdda)->Projection->Aggr(firstrow(Column#3))
logic: DataScan(tdda)->Aggr(firstrow(1))
physical: TableReader(Table(tdda)->HashAgg)->HashAgg
finalPlan: TableReader(Table(tdda)->HashAgg)->HashAgg After analyze: mysql> explain (SELECT DISTINCT SQRT(1) FROM t);
+--------------------------+---------+-----------+---------------+-----------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------+---------+-----------+---------------+-----------------------------------------+
| HashAgg_6 | 1.00 | root | | group by:1, funcs:firstrow(1)->Column#3 |
| └─TableReader_11 | 3.00 | root | | data:TableFullScan_10 |
| └─TableFullScan_10 | 3.00 | cop[tikv] | table:t | keep order:false |
+--------------------------+---------+-----------+---------------+-----------------------------------------+
3 rows in set (0.00 sec) logical p: DataScan(tdda)->Projection->Aggr(firstrow(Column#3))
logic: DataScan(tdda)->Aggr(firstrow(1))
physical: TableReader(Table(tdda))->HashAgg
finalPlan: TableReader(Table(tdda))->HashAgg |
So the issue happens if we do the HashAgg operation twice. |
When the value is 2, tikv will return 2 columns and get the correct result.
|
Bug Report
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
In theory, the result of
sql2
(DISTINCT
) ⊆ the result ofsql1
.3. What did you see instead (Required)
However, the value
1
changed to5e-324
after addingDISTINCT
, seems like a logical bug.4. What is your TiDB version? (Required)
The text was updated successfully, but these errors were encountered: