Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

Closed
zhaoyuxi opened this issue Aug 16, 2016 · 12 comments
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Milestone

Comments

@zhaoyuxi
Copy link

zhaoyuxi commented Aug 16, 2016

1 Issue appearance
I created two tables and inserted 800,000 rows into each table. A relational query hanging more than 20 minutes is not over yet. 1 CPU was 100% fully filled by a switch thread such as following two stacks:

Thread 11 (Thread 0x2af3f2c00700 (LWP 11476)):
#0 0x000000000058ca11 in runtime.getitab ()
#1 0x000000000058dbe5 in runtime.assertI2I ()
#2 0x0000000000934cd7 in github.com/cockroachdb/cockroach/sql/parser.(*ComparisonExpr).Eval ()
#3 0x000000000092f7c0 in github.com/cockroachdb/cockroach/sql/parser.(*AndExpr).Eval ()
#4 0x0000000000c21b83 in github.com/cockroachdb/cockroach/sql/sqlbase.RunFilter ()
#5 0x0000000000f1c0e1 in github.com/cockroachdb/cockroach/sql.(*selectNode).Next ()
#6 0x0000000000f23fa6 in github.com/cockroachdb/cockroach/sql.(*selectTopNode).Next ()
#7 0x0000000000ece0e6 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmt ()
#8 0x0000000000ecb9e9 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtInOpenTxn ()
#9 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#10 0x0000000000ec9aad in github.com/cockroachdb/cockroach/sql.runTxnAttempt ()
#11 0x0000000000f5cb60 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest.func2 ()
#12 0x0000000000ae8c4b in github.com/cockroachdb/cockroach/internal/client.(*Txn).Exec ()
#13 0x0000000000ec8450 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest ()
#14 0x0000000000ec7656 in github.com/cockroachdb/cockroach/sql.(*Executor).ExecuteStatements ()
#15 0x0000000000fec6a8 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).executeStatements ()
#16 0x0000000000fe8548 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).handleSimpleQuery ()
#17 0x0000000000fe813a in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).serve ()
#18 0x0000000000fde369 in github.com/cockroachdb/cockroach/sql/pgwire.(*Server).ServeConn ()
#19 0x0000000000ad298d in github.com/cockroachdb/cockroach/server.(*Server).Start.func8.1 ()
#20 0x0000000001000c7d in github.com/cockroachdb/cockroach/util/netutil.(*Server).ServeWith.func1 ()
#21 0x00000000005dfbd1 in runtime.goexit ()
#22 0x000000c82037ce10 in ?? ()
#23 0x000000c8201581f0 in ?? ()
#24 0x00002af38040a180 in ?? ()
#25 0x000000c82d16ed10 in ?? ()
#26 0x000000c82035e000 in ?? ()
#27 0x7568746967203866 in ?? ()
#28 0x616d2f6d6f632e62 in ?? ()
#29 0x692d6f672f6e7474 in ?? ()
#30 0x36353a7974746173 in ?? ()
#31 0x000000c82543a958 in ?? ()
#32 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#33 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x2af3f3203700 (LWP 11479)):
#0 0x0000000000589cb3 in runtime.mapiternext ()
#1 0x0000000000f2265a in github.com/cockroachdb/cockroach/sql.qvalMap.populateQVals ()
#2 0x0000000000f1c099 in github.com/cockroachdb/cockroach/sql.(*selectNode).Next ()
#3 0x0000000000f23fa6 in github.com/cockroachdb/cockroach/sql.(*selectTopNode).Next ()
#4 0x0000000000ece0e6 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmt ()
#5 0x0000000000ecb9e9 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtInOpenTxn ()
#6 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#7 0x0000000000ec9aad in github.com/cockroachdb/cockroach/sql.runTxnAttempt ()
#8 0x0000000000f5cb60 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest.func2 ()
#9 0x0000000000ae8c4b in github.com/cockroachdb/cockroach/internal/client.(*Txn).Exec ()
#10 0x0000000000ec8450 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest ()
#11 0x0000000000ec7656 in github.com/cockroachdb/cockroach/sql.(*Executor).ExecuteStatements ()
#12 0x0000000000fec6a8 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).executeStatements ()
#13 0x0000000000fe8548 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).handleSimpleQuery ()
#14 0x0000000000fe813a in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).serve ()
#15 0x0000000000fde369 in github.com/cockroachdb/cockroach/sql/pgwire.(*Server).ServeConn ()
#16 0x0000000000ad298d in github.com/cockroachdb/cockroach/server.(*Server).Start.func8.1 ()
#17 0x0000000001000c7d in github.com/cockroachdb/cockroach/util/netutil.(*Server).ServeWith.func1 ()
#18 0x00000000005dfbd1 in runtime.goexit ()
#19 0x000000c82037ce10 in ?? ()
#20 0x000000c8201581f0 in ?? ()
#21 0x00002af38040a180 in ?? ()
#22 0x000000c82d16ed10 in ?? ()
#23 0x000000c82035e000 in ?? ()
#24 0x7568746967203866 in ?? ()
#25 0x616d2f6d6f632e62 in ?? ()
#26 0x692d6f672f6e7474 in ?? ()
#27 0x36353a7974746173 in ?? ()
#28 0x000000c82543a958 in ?? ()
#29 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#30 0x0000000000000000 in ?? ()

2 Two tables:
CREATE TABLE a(
id STRING(128) PRIMARY KEY,
name STRING(32),
type STRING(32),
int_id INT UNIQUE,
int_type INT,
INDEX type_idx (type),
INDEX int_id_idx (int_id),
INDEX int_type_idx (int_type)
);
CREATE TABLE b(
id STRING(128) PRIMARY KEY,
name STRING(32),
type STRING(32),
int_id INT UNIQUE,
int_type INT,
foreignA STRING NOT NULL REFERENCES a,
INDEX(foreignA)
);

All was fast as following:
SELECT * from a where a.id='70000000';
SELECT * from a where id='70000000';
SELECT * from b where b.id='70000000';
SELECT * from b where id='70000000';

But I didn’t have the patience to wait for the result after more than 20 minutes on this case:
SELECT * from a,b where a.id='70000000' and b.id='70000000';

  1. I expected a fast result. But it’s not just a full scan. A full scan had ended early within 1 minute according to my test.
@petermattis
Copy link
Collaborator

@zhaoyuxi Joins are functional for small tables, but not at all optimized. They likely have pathological performance problems. Cc @knz, we can speak to exactly what is going on in this case.

@knz
Copy link
Contributor

knz commented Aug 16, 2016

Try this:

select * from (select * from a where a.id=70000000) as a, (select * from b where b.id=70000000) as b;

We don't yet propagate the filters in the outer select into the inner scans, so you have to do it manually to get the expected performance. There is a bunch of techniques we plan to use to optimize query performance and they are on the roadmap, however we're not yet there.

For an overview of the current state of our JOIN implementation you can peruse the following blog post:
https://www.cockroachlabs.com/blog/cockroachdbs-first-join/

@knz knz changed the title Very bad performance issue:A relational query based on PK results in a full table scan sql/performance: independent filter expressions on JOINs should be pushed down to the operands Aug 16, 2016
@knz knz added this to the Later milestone Aug 16, 2016
@knz knz added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Aug 16, 2016
@zhaoyuxi
Copy link
Author

@petermattis, Thank you very much. It works now. Our service expects so much on relational queries optimization.

@zhaoyuxi
Copy link
Author

@knz,@petermattis. Thank you very much.

@knz
Copy link
Contributor

knz commented Aug 17, 2016

@zhaoyuxi if you have time we'd be interested to know which types of JOINs your service uses the most often. This way we could put them higher in our list of objectives.

@zhaoyuxi
Copy link
Author

@knz It is my pleasure. Our service has 50 kinds of structured objects. The ojbect and its index are separately stored.No strong consistency. Read much more and write much less. First do the select based on the index table to get the object ID. Then get the object. The great challenges for us are:

  1. Some select query performance is very poor. And latency is also high. The select statement is often as follows:
    SELECT main.key /,sub1.mKey,sub2.mKey,sub4.mKey/
    FROM main
    INNER JOIN sub1 ON main.key = sub1.mKey and (sub1.contentType in (0,4,6,10) and sub1.netID=2)
    INNER JOIN sub2 ON main.key = sub2.mKey and (sub2.startTime>? and sub2.endTime<?)
    INNER JOIN sub3 ON main.key = sub3.mKey and (sub3.recommend==1)
    INNER JOIN sub4 ON main.key = sub4.mKey and (sub4.status==1 or ...)
    ORDER BY main.key

Query result has about 10000 records. The page only takes topN usually 10 every time. 1500 select TPS is asked on 4C36G VM.

2.Several objects may span multiple nodes, and the efficiency of distributed SQL is even worse. Not only to be flexible, but also to require low CPU consumption.

@zhaoyuxi
Copy link
Author

zhaoyuxi commented Aug 18, 2016

It’s our pleasure to get your ping. Thanks very much.

Cockroach is so amazing for us. Scalable, Survivable, SQL and Consistent (Not strongly) are our needs But we worry about SQL performance. Especially distributed SQL.
Our service has 50 kinds of structured objects. The object and its index are separately stored. No strong consistency. Read much more and write much less. First do the select based on the index table to get the object ID. Then get the object. The great challenges for us are:

  1. Some select query performance is very poor. And latency is also high. The select statement is often as follows:
    SELECT main.key /,sub1.mKey,sub2.mKey,sub4.mKey/
    FROM main
    INNER JOIN sub1 ON main.key = sub1.mKey and (sub1.contentType in (0,4,6,10) and sub1.netID=2)
    INNER JOIN sub2 ON main.key = sub2.mKey and (sub2.startTime>? and sub2.endTime<?)
    INNER JOIN sub3 ON main.key = sub3.mKey and (sub3.recommend==1)
    INNER JOIN sub4 ON main.key = sub4.mKey and (sub4.status==1 or ...)
    ORDER BY main.key
    Query result has about 10000 records. The page only takes topN usually 10 every time. 1500 select TPS is asked on 4C36G VM.
    2.Several objects may span multiple nodes, and the efficiency of distributed SQL is even worse. Not only to be flexible, but also to require low CPU consumption.
  2. Now we are just in the initial design and prototype development. The performance has a gap. Probably we will do the SQL optimization according to our service. The deep optimization will go deep into the distribution of physical storage. We need to know how the data are ranged and to know where it’s. So the exploring of the storage distribution and even to control it is also our needs. We need this deep control for future optimization.

@knz
Copy link
Contributor

knz commented Aug 18, 2016

cc @RaduBerinde @andreimatei

@knz
Copy link
Contributor

knz commented Aug 18, 2016

@zhaoyuxi thanks for sharing.
Now I wonder how much concurrency we can extract from this type of query. Can you say more about how many different rows of the "main" table are involved in the final 10000 records? Are the tables sub1-sub4 filters over main, or is main a filter over the other 4? (I wonder if we can run the filters on the sub1-sub4 tables concurrently.)

@RaduBerinde
Copy link
Member

RaduBerinde commented Aug 18, 2016

@knz From the sample query, each sub1-sub4 table has a filter that can be evaluated on that table.
A decent strategy would be to use merge join (especially if main is already indexed by key). With distsql the four scans with filters would run concurrently. For filters that are very permissive (sub4.status == 1 might be), we could do point lookups if we only want the top 10 results (so e.g. merge join with sub1, sub2, sub3 and then point lookups in sub4 until we get 10 hits).

@zhaoyuxi Thanks for sharing. Please note that our current JOIN implementation is meant to be just functional (only on very small data sets); it is not anywhere close to what we plan to have in 1.0. We also plan to distribute SQL computation among the nodes that store relevant data.

I would like to hear more detail on 3. It sounds like you want to know exactly how data is distributed and stored? That is not something we have planned.. Assuming you had a way of knowing how table data is sliced and where each piece is stored, what would you do with that information? What is the "deep optimization" regarding?
We provide some control using zones; one can control on what stores certain tables go to. So e.g. if some nodes are faster or some stores have faster storage, one can make some "important" tables only use those nodes/stores.

@zhaoyuxi
Copy link
Author

zhaoyuxi commented Aug 18, 2016

Now the main has 6,000,000 records. The sub about 10100 times than the main.
Tables sub1-sub4 tables are the filters with 1
5 columns. The main table references the domain object. It often means all sub filters are satisfied. Filter one sub by one sub(All are ‘and’ operation). And the sub is composed of ‘or’ or ‘and’ operations. The filter columns are the type of Boolean, string enumeration(less than 10 values, in or equal operation), integer enumeration(less than 10 values, in or equal operation) for the most case. And some are period time(between operation) or counters(less than or more than or ordered by operation).

@zhaoyuxi
Copy link
Author

zhaoyuxi commented Aug 19, 2016

@RaduBerinde Thanks for your input.
If all the filters(rows of main and sub) from the same domain object is not in a single node, the distriubuted SQL performance probably will be bad. We plan that just the host nodes(hold the domain object filters) do a local select concurrently without communication with other nodes. Then merge the filtered rows. Consistency read is not asked for our service.
The zone probably can't support all the related rows from relational tables are stored in the same node. Different cockroach cluster can also support association together. But the flexibility is lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

4 participants