Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add scan_to_stream() to Table trait to postpone the stream generation #1639

Merged
merged 18 commits into from
May 29, 2023

Conversation

waynexia
Copy link
Member

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Intension

scan() method on Table returns a physical plan (execution plan in DF) directly. This manner has several limitations:

  • Our scan options are highly tightened to DataFusion. Table implementations are usually simply adapted to DataFusion's table source, and it is hard to add new options
  • The scan options may change during plan optimization. Thus we may generate the execution plan multiple times.
  • There is still some I/O happens during scan().

Changes

Add a new interface scan_to_stream() to replace the existing scan(). It returns a stream that can be adapted to execution plan. And leverage the middle layer DfTableProviderAdapter as the only entry point of all the Table implementations. It holds a Table reference and a mutable ScanRequest that can be changed during the optimization phase. We can further postpone the call to scan_to_stream() to the return value of DfTableProviderAdapter::scan(), to avoid I/O during planning.

The main change is the new interface scan_to_stream() and deprecating scan(). InformationSchema and file table engine have some noticeable changes due to this.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

waynexia and others added 10 commits May 24, 2023 14:35
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary regression due to the lack of an optimization rule.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Merging #1639 (96236b5) into develop (d072947) will decrease coverage by 0.26%.
The diff coverage is 86.87%.

❗ Current head 96236b5 differs from pull request most recent head 892d9f5. Consider uploading reports for the commit 892d9f5 to get more accurate results

@@             Coverage Diff             @@
##           develop    #1639      +/-   ##
===========================================
- Coverage    85.73%   85.48%   -0.26%     
===========================================
  Files          566      566              
  Lines        90433    90795     +362     
===========================================
+ Hits         77537    77614      +77     
- Misses       12896    13181     +285     

src/catalog/src/information_schema.rs Show resolved Hide resolved
src/catalog/src/information_schema.rs Outdated Show resolved Hide resolved
src/catalog/src/information_schema.rs Show resolved Hide resolved
src/catalog/src/information_schema/columns.rs Outdated Show resolved Hide resolved
src/common/recordbatch/src/adapter.rs Outdated Show resolved Hide resolved
src/datatypes/src/schema.rs Outdated Show resolved Hide resolved
waynexia added 5 commits May 29, 2023 11:00
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
src/table/src/error.rs Outdated Show resolved Hide resolved
src/catalog/src/information_schema/columns.rs Show resolved Hide resolved
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@v0y4g3r
Copy link
Contributor

v0y4g3r commented May 29, 2023

Looks like the failing CI checks checked out an unrelated commit id 9ddb79ae91e23e17b16421b5a34f7d70aa6a1fd8. Let force-merge this PR and proceed.

@v0y4g3r v0y4g3r merged commit b27c569 into develop May 29, 2023
@v0y4g3r v0y4g3r deleted the order-rule branch May 29, 2023 12:03
@waynexia waynexia mentioned this pull request May 29, 2023
2 tasks
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
…m generation (GreptimeTeam#1639)

* add scan_to_stream to Table

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl parquet stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* reorganise adapters

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* implement scan_to_stream for mito table

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add location info

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: table scan

* UT pass

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl project record batch

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix information schema

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* resolve CR comments

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove one todo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix errors generated by merge commit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add output_ordering method to record batch stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix rustfmt

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* enhance error types

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants