Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanded index matches, added many index queries #641

Merged
merged 1 commit into from
Nov 30, 2021

Conversation

Hydrocharged
Copy link
Contributor

Besides the obvious addition of a ton of tests (ALL of which make use of indexes, which are enforced by the EXPLAIN statements), this also allows for another scenario to match indexes, although we're still missing a fairly important one. I'll list the scenarios below (assume an index over columns (v1, v2, v3).

The new one we now match: previously expressions had to match the index prefix, but now it has been expanded to require only the first index expression be matched. Therefore, the filter WHERE v1 = 5 AND v3 = 2 is now used by indexes as we can still filter on v1 = 5 using the same principles that partial indexes use. v3 = 2 may not allow for rows to be skipped on the integrator's side when reading from their internal store, but may allow for them to be skipped when returning to the engine (removing the conversion to a sql.Row from the integrator's implementation which should be a fairly substantial speed increase).

This has the added consequence that we can properly combine all ranges that make use of a specific index. That is,

(v1 > 1 AND v3 > 1) OR (v1 < 4 AND v2 < 4)

will now combine the two ranges (separated by OR). Reducing the prefix limitation has greatly increased the index match possibility. Just to note, the returned indexes are sorted by whether they're an exact match, then by how many expressions are a prefix, then by the index size, and lastly by the index name, so you should always get the "best" index for any given set of expressions.

The above works in the case of a single index existing over the three columns. If (v1, v3, v2) is another index then the above example will match to two different indexes (both being a perfect prefix to an index). There is a way to get around this by taking note of which index has been used previously if multiple indexes share the same columns (or some indexes are subsets/supersets of some other indexes), however this was attempted and abandoned due to the added complexity and time it would take to implement.

Also built off of this, we cannot match

(v1 > 1 AND v3 > 1) OR (v2 < 4)

As the (v2 < 4) is missing the first index expression (v1), therefore no index is used here (one is used by MySQL, which was tested to confirm that one should be in this case). Again this would be fixed by taking note of which indexes have already been used, as we only need the prefix to be matched one, and then all other sets of expressions just need to be subsets.

Either way, we should now match many more queries than previously.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good comments on interfaces

I have not reviewed the massive new tests in detail, and it's unlikely anyone will.

I would suggest that you extract some representative query patterns from them and move them to query_plans.go and queries.go (for plans and correctness), so that they will actually be subject to human review going forward. These tests are so massive and numerous that they're really only good for catching regressions.

sql/analyzer/index_analyzer_test.go Show resolved Hide resolved
@Hydrocharged Hydrocharged merged commit 1e491e6 into main Nov 30, 2021
@Hydrocharged Hydrocharged deleted the daylon/index-tests branch November 30, 2021 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants