Expanded index matches, added many index queries #641

Hydrocharged · 2021-11-22T18:23:19Z

Besides the obvious addition of a ton of tests (ALL of which make use of indexes, which are enforced by the EXPLAIN statements), this also allows for another scenario to match indexes, although we're still missing a fairly important one. I'll list the scenarios below (assume an index over columns (v1, v2, v3).

The new one we now match: previously expressions had to match the index prefix, but now it has been expanded to require only the first index expression be matched. Therefore, the filter WHERE v1 = 5 AND v3 = 2 is now used by indexes as we can still filter on v1 = 5 using the same principles that partial indexes use. v3 = 2 may not allow for rows to be skipped on the integrator's side when reading from their internal store, but may allow for them to be skipped when returning to the engine (removing the conversion to a sql.Row from the integrator's implementation which should be a fairly substantial speed increase).

This has the added consequence that we can properly combine all ranges that make use of a specific index. That is,

(v1 > 1 AND v3 > 1) OR (v1 < 4 AND v2 < 4)

will now combine the two ranges (separated by OR). Reducing the prefix limitation has greatly increased the index match possibility. Just to note, the returned indexes are sorted by whether they're an exact match, then by how many expressions are a prefix, then by the index size, and lastly by the index name, so you should always get the "best" index for any given set of expressions.

The above works in the case of a single index existing over the three columns. If (v1, v3, v2) is another index then the above example will match to two different indexes (both being a perfect prefix to an index). There is a way to get around this by taking note of which index has been used previously if multiple indexes share the same columns (or some indexes are subsets/supersets of some other indexes), however this was attempted and abandoned due to the added complexity and time it would take to implement.

Also built off of this, we cannot match

(v1 > 1 AND v3 > 1) OR (v2 < 4)

As the (v2 < 4) is missing the first index expression (v1), therefore no index is used here (one is used by MySQL, which was tested to confirm that one should be in this case). Again this would be fixed by taking note of which indexes have already been used, as we only need the prefix to be matched one, and then all other sets of expressions just need to be subsets.

Either way, we should now match many more queries than previously.

zachmu

LGTM, good comments on interfaces

I have not reviewed the massive new tests in detail, and it's unlikely anyone will.

I would suggest that you extract some representative query patterns from them and move them to query_plans.go and queries.go (for plans and correctness), so that they will actually be subject to human review going forward. These tests are so massive and numerous that they're really only good for catching regressions.

sql/analyzer/index_analyzer_test.go

Hydrocharged requested a review from zachmu November 22, 2021 18:23

zachmu approved these changes Nov 22, 2021

View reviewed changes

sql/analyzer/index_analyzer_test.go Show resolved Hide resolved

Expanded index matches, added many index queries

c164504

Hydrocharged force-pushed the daylon/index-tests branch from 89562e4 to c164504 Compare November 30, 2021 11:15

Hydrocharged merged commit 1e491e6 into main Nov 30, 2021

Hydrocharged deleted the daylon/index-tests branch November 30, 2021 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expanded index matches, added many index queries #641

Expanded index matches, added many index queries #641

Hydrocharged commented Nov 22, 2021

zachmu left a comment

Expanded index matches, added many index queries #641

Expanded index matches, added many index queries #641

Conversation

Hydrocharged commented Nov 22, 2021

zachmu left a comment

Choose a reason for hiding this comment