Create interface for indexible tables in `IndexedTableAccess` #1938

nicktobey · 2023-08-12T00:25:34Z

Currently, only ResolvedTables are allowed to have indexes. There exists an interface, sql.IndexAddressable, which any node or table can implement in order to be a candidate for index-based optimization. But in practice, implementing that interface won't actually do anything because the IndexedTableAccess struct explicitly requires a ResolvedTable.

This PR replaces the ResolvedTable field in IndexedTableAccess with a new interface tentatively called TableNode, although a more specific name would probably be better.

In order for a node to be used for index-based optimization, it must implement this interface, and the table returned by the UnderlyingTable method must implement sql.IndexAddressable

…le implements this interface but other structs can too.

…o longer necessarily a resolved table.

…Node, not plan.ResolvedTable

…dTable

…functions that support it.

max-hoffman

I like the generalization, I am not sure how UnresolvedTable fits in, but those only exist on non-execution paths now. I think it would be a mistake not to add a small test indexable table function in GMS as part of these changes, but I think it is equally good to get the first set of changes in and then follow-up with the test. TestTableFunctions and memory/sequence_table.go are a reference for how to do this.

sql/plan/ddl.go

zachmu

New interface seems fine to me, TableNode is a fine name

sql/analyzer/tables.go

sql/plan/procedure_resolved_table.go

sql/plan/resolved_table.go

… ResolvedTable with TableNode.

nicktobey · 2023-08-21T19:25:58Z

Prior to this PR, it was not possible for system table functions to have indexes. (You could make the table function implement sql.IndexAddressable, but it wouldn’t actually do anything.)

It’s now possible to write a system table function that can be optimized. The workflow looks like this:

Create a sql.Partition implementation that encodes all information necessary for the table function logic to only generate rows that match the filter. For example, include a field in the partition object for each field being filtered on.

Partitions were a feature inherited from GMS, designed to facilitate parallelism: a table is divided into multiple partitions that can each be iterated over separately. Dolt doesn’t support parallelism, but we still use partitions because indexes are implemented via partitions: every node that is indexable converts index lookups into an PartitionIterator.

Implement a Partitions method, which returns a partition iterator that will contain every row in the table exactly once. This method is called in situations where the caller wants to get every row in the table.
Implement a PartitionRows method, which takes a partition and returns a RowIter. Most likely, either this or the existing RowIter function will end up calling the other one.
Implement GetIndexes, which provides the list of indexes that the table supports.

These don’t need to be full implementations of the sql.Index interface. Most index functionality won’t actually be used. In the case of table functions, it’s not even possible for a query to inspect these indexes. There doesn't need to be an underlying table.

Implement LookupPartitions, which takes in IndexLookup and generates a PartitionIter.

The process of implementing these methods is cumbersome, but straightforward, so long as you’re only exposing a single index on a single column. Optimizing on multiple columns becomes more complicated.

In addition, there may be optimizations that can’t be expressed as an Index. For instance, imagine a hypothetical system table function that has N columns and any combination of these columns can be filtered on efficiently. In order to make sure that every combination of filters can be done efficiently in the current framework, the node would need to provide N! different indexes.

A better solution would be to allow system tables (and system table functions) to be aware of filters when generating their rows. Conceptually, this could be done by having system tables implement an interface which consumes a filter expression and produces a new system table node. In cases where no optimization can be performed, the interface returns the original system table unchanged. Then we add an optimization that runs after we push filters down the tree, that pattern matches for filters whose child nodes implement this interface.

If we decide to optimize more system table functions in the future, we should strongly consider this better solution, since it will result in cleaner, more readable, more maintainable code that’s faster to write.

nicktobey force-pushed the nicktobey/table-functions branch 2 times, most recently from 0e965aa to 2cfc612 Compare August 14, 2023 20:42

nicktobey requested a review from max-hoffman August 14, 2023 21:17

nicktobey force-pushed the nicktobey/table-functions branch from 2cfc612 to 4be5ccc Compare August 14, 2023 23:34

nicktobey added 9 commits August 16, 2023 17:20

ResolvedTable should implement Databaser.

6d6c42b

IndexedTableAccess now wraps a new interface (TableNode): ResolvedTab…

4e8aec9

…le implements this interface but other structs can too.

Rename IndexedTableAccess.ResolvedTable to reflect the fact it is n…

2024df0

…o longer necessarily a resolved table.

Pull cast to sql.TableWrapper into ResolvedTable::UnderlyingTable

c29ae92

Add docstrings.

260cc62

Rename function parameters to reflect that they operate on plan.Table…

ee3d8a2

…Node, not plan.ResolvedTable

Update parameter types to accomodate plan.TableNode, not plan.Resolve…

d63902c

…dTable

Populate indexes of TableNodes into the memo when generating plans.

c4e5b85

Allow LookupJoins and indexes from outer scopes to be apply to table …

777d1bb

…functions that support it.

nicktobey force-pushed the nicktobey/table-functions branch from 50e9d91 to 777d1bb Compare August 17, 2023 00:20

max-hoffman approved these changes Aug 17, 2023

View reviewed changes

sql/plan/ddl.go Outdated Show resolved Hide resolved

zachmu reviewed Aug 17, 2023

View reviewed changes

sql/analyzer/tables.go Outdated Show resolved Hide resolved

sql/plan/procedure_resolved_table.go Outdated Show resolved Hide resolved

sql/plan/resolved_table.go Show resolved Hide resolved

nicktobey added 11 commits August 17, 2023 10:17

Fix comment in DropTable

5b5021b

Merge branch 'main' into nicktobey/table-functions

78d0f0a

Add missing parentheses.

dfee15e

Fix accidental find/replace in documentation.

0188724

Merge branch 'main' of github.com:dolthub/go-mysql-server into HEAD

73f89c4

Add example indexable table function SequenceTableFunction.

4c2f4d7

Allow Script Query tests to test for the presence of indexes.

6a581be

Combine sequence_table and sequence_table_function

a9f9ec1

Add tests for indexes on the sequence_table test table function.

512857f

Allow the creation of IndexScans for TableNodes

538823b

Refactor IndexedTableAccess documentation and method names to replace…

999201d

… ResolvedTable with TableNode.

nicktobey force-pushed the nicktobey/table-functions branch from 32a29f0 to 999201d Compare August 19, 2023 01:05

nicktobey and others added 2 commits August 19, 2023 01:07

[ga-format-pr] Run ./format_repo.sh to fix formatting

1d2920f

Add docstring to TableNode

0971ddf

Move TableNode to sql package.

85b0507

sequence table has unique indexes.

0c1aad2

nicktobey merged commit dcdb9ae into main Aug 21, 2023
6 checks passed

nicktobey deleted the nicktobey/table-functions branch August 21, 2023 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create interface for indexible tables in `IndexedTableAccess` #1938

Create interface for indexible tables in `IndexedTableAccess` #1938

nicktobey commented Aug 12, 2023

max-hoffman left a comment

zachmu left a comment

nicktobey commented Aug 21, 2023

Create interface for indexible tables in IndexedTableAccess #1938

Create interface for indexible tables in IndexedTableAccess #1938

Conversation

nicktobey commented Aug 12, 2023

max-hoffman left a comment

Choose a reason for hiding this comment

zachmu left a comment

Choose a reason for hiding this comment

nicktobey commented Aug 21, 2023

Create interface for indexible tables in `IndexedTableAccess` #1938

Create interface for indexible tables in `IndexedTableAccess` #1938