Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Projection Pushdown (Parquet) #196

Closed
joocer opened this issue Jun 14, 2022 · 6 comments
Closed

[FEATURE] Projection Pushdown (Parquet) #196

joocer opened this issue Jun 14, 2022 · 6 comments
Assignees
Labels
Awaiting Closure Fixed - waiting for merging/releasing Next Release Planned for next release Performance 🏃‍♀️ Improve performance

Comments

@joocer
Copy link
Contributor

joocer commented Jun 14, 2022

Push down the projection to the read step.

This should improve performance by handling less data in the processing steps.

@joocer
Copy link
Contributor Author

joocer commented Jun 14, 2022

only implement for the external readers (blob, nosql and sql readers)

@joocer
Copy link
Contributor Author

joocer commented Jun 18, 2022

This is probably going to be harder than just collecting all the tokens which are labelled as identifiers, especially when there's joins or sub queries, or aliases.

@joocer
Copy link
Contributor Author

joocer commented Jun 20, 2022

Implement hint NO_PUSH_PROJECTION at the same time

@joocer joocer removed their assignment Jun 25, 2022
joocer added a commit that referenced this issue Aug 13, 2022
joocer added a commit that referenced this issue Aug 13, 2022
@joocer joocer added the Next Release Planned for next release label Aug 15, 2022
joocer added a commit that referenced this issue Aug 16, 2022
@joocer joocer changed the title [FEATURE] Projection Pushdown [FEATURE] Projection Pushdown (Parquet) Aug 19, 2022
@joocer joocer self-assigned this Aug 19, 2022
@joocer
Copy link
Contributor Author

joocer commented Aug 19, 2022

Do this as a first page to gather all the fields, then intersect with the selected fields and use that on future page reads.

This will mean no benefit on small datasets (single page), but that's not what would benefit from this anyway.

A '*' in the field list should disable the optimization.

This should be reflected in EXPLAIN.

Note that NATURAL JOIN should add a '*' to the field list when implemented.

This should NOT be the same approach taken for other data types.

@joocer
Copy link
Contributor Author

joocer commented Aug 19, 2022

This may conflict with the schema evolution feature, what happens if we select a column that doesn't exist. Maybe we need to wrap in a try and do more expensive work if it fails.

joocer added a commit that referenced this issue Aug 20, 2022
@joocer joocer added the Awaiting Closure Fixed - waiting for merging/releasing label Aug 20, 2022
@joocer
Copy link
Contributor Author

joocer commented Aug 20, 2022

Can the field list be converted to a set earlier and once

Can we use the schema to update the field list set

joocer added a commit that referenced this issue Aug 20, 2022
@joocer joocer closed this as completed Aug 20, 2022
joocer added a commit that referenced this issue Aug 20, 2022
FEATURE/#196 - Initial Projection Pushdown (Parquet only)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Closure Fixed - waiting for merging/releasing Next Release Planned for next release Performance 🏃‍♀️ Improve performance
Projects
None yet
Development

No branches or pull requests

1 participant