Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more types of Join to test #5351

Closed
10 tasks done
Willendless opened this issue Jul 11, 2022 · 1 comment · Fixed by #5543
Closed
10 tasks done

Support more types of Join to test #5351

Willendless opened this issue Jul 11, 2022 · 1 comment · Fixed by #5543
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@Willendless
Copy link
Contributor

Willendless commented Jul 11, 2022

Enhancement

Current executor test framework only supports in testing three kinds of join: ASTTableJoin::Kind::left, ASTTableJoin::Kind::right and ASTTableJoin::Kind::inner.

We need to support more types of join.

Design Doc

This design doc aims to solve the following drawbacks within current executor test framework

  • Incomplete join kind supports, therefore incomplete join test cases
  • No on expression supports

Detailed work can be separated into several steps.

Step I: Simplify join executor's translation workflow

rationale

Current join executor translation is quite cumbersome. It includes

First, test framework phase: test API -> ASTTableJoin -> DB::mock::Join -> DAGRequest

Then, tiflash execution engine phase: DAGRequest -> part of ASTTableJoin -> TiflashJoin -> DB::Join

As can be seen from the flow, DAGRequest is the bridge between test framework and execution engine being tested. However, ASTTableJoin appears twice and is actually redundant in the first phase because

  1. DAGRequest and ASTTableJoin has different join type classification approach, unfortunately a many-to-many mapping
  2. From the ut writer's perspective, the only thing he/she should know is that using framework's API the product is a DAGRequest in line with expectations. Specifically, it means two things
    • he/she should be aware of the mapping from DAGRequest to ASTTableJoin, because it is how the execution engine will do towards the input
    • he/she should not be aware of the mapping from ASTTableJoin to DAGRequest, it has nothing to do with the code being tested and requires additional mental load

In a word, we should avoid using ASTTableJoin in the first phase and rather make the first phase be like testAPI -> DB::mock::Join -> DAGRequest using tiup::JoinType.

compatibility issue

Some legacy tests rely on clickhouse's native parser to translate sql query string to AST trees and then use toTiPBExecutor to get DAGRequest, since these old tests follow the code path that only supports left/right/inner join:

switch (join_params.kind) // todo support more type...
{
case ASTTableJoin::Kind::Inner:
join->set_join_type(tipb::JoinType::TypeInnerJoin);
break;
case ASTTableJoin::Kind::Left:
join->set_join_type(tipb::JoinType::TypeLeftOuterJoin);
break;
case ASTTableJoin::Kind::Right:
join->set_join_type(tipb::JoinType::TypeRightOuterJoin);
break;
default:
throw Exception("Unsupported join type");
}

one workaround might be move the translation for ASTTableJoin::Kind->TiPB::JoinType from toTiPBExecutor forward to compileJoin. This is another view that can show the restriction of old API. We might be able to deprecate these tests in the future.

deliverables

After this step, deliverables include

  • updated DAGRequestBuilder::join test api with tipb::JoinType and avoid using ASTTableJoin in the framework phase
  • passed all the join regression tests

Step II: Enable all join kinds and enhance test cases

rationale

Based on the first step, now it is possible to support all DAGRequest::JoinType given a valid using expression. All join kinds passed from TiDB to TiFlash in DAGRequest are as follows

  • InnerJoin
  • LeftOuterJoin
  • RightOuterJoin
  • SemiJoin
  • AntiSemiJoin
  • LeftOuterSemiJoin
  • AntiLeftOuterSemiJoin

deliverables

  • All DAGRequest::JoinType support and corresponding test cases
  • Increased join executor test coverage

Step III: Support more complicated DAGRequest construction [WIP]

rationale

Though it is possible to build DAGRequest with all kinds of DAGRequest::JoinType, it does not mean that the tests are exhausted because, again, JoinType -> ASTTableJoin::Kind is not a 1-to-1 mapping. Those ASTTableJoin::Kind secretly used within tiflash with prefix cross_ requires DAGRequest to not have join keys.

For instance, if you write a query like select * from t1 left join t2 on t1.a = 1, tidb will construct a DAGRequest with type tipb::JoinType::TypeLeftOuterJoin but no join keys since expression t1.a = 1 only references left table. Then within tiflash, it will interpret the join type as ASTTableJoin::Kind::Cross_Left.

Now it is clear that we can only test those Cross_* types until we first support on expression in the framework. However, to directly support on expression in the framework might not be a good idea because

  1. complexity: the expression used with on can be any conditional expression of the form that can be used in a WHERE clause
  2. compatibility: in order to parse the expression correctly, the parsing logic should have the exactly same behavior as how tidb handles on
  3. table column reference: our current framework does not support qualified column name

Since the main goal of our framework is to construct the DAGRequest, we can instead choosing a DAGRequest-oriented approach. Namely, ask test writers to be specific about how to set each field of join executor in DAGRequest rather than providing an on expression parser to do this job for them.

draft

tipb::Join has following fields related to the on expression parsing logic

deliverables

  • two join builder APIs:
DAGRequestBuilder & join(const DAGRequestBuilder & right, tipb::JoinType tp,
MockAstVec join_col_exprs,
MockAstVec left_conds,
MockAstVec right_conds,
MockAstVec other_conds,
MockAstVec other_eq_conds_from_in);

DAGRequestBuilder & join(const DAGRequestBuilder & right, tipb::JoinType tp, MockAstVec join_col_exprs);

Note: since the new join builder API is kind of harder to use (user must know how TiDB translates sql's join-on clause to DAGRequest's conditional expression fields), instead of using default parameters for conditional expressions, we explicitly require them to supplement these fields and recommend them to use the old one unless they have to test conditional expressions for join in the comment.

  • test coverage improvement of join executor

Progress Tracking

  • step 1: Simplify join executor's translation workflow
  • step 2: Enable all join kinds and enhance test cases
  • step 3: Support more complicated DAGRequest construction

08:43:35 UTC+8 Sunday, August 7, 2022

@Willendless Willendless added the type/enhancement The issue or PR belongs to an enhancement. label Jul 11, 2022
@Willendless
Copy link
Contributor Author

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant