-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable reading StringView by default from Parquet (schema_force_string_view
) by default
#11682
Comments
Want to share my numbers here:
|
take |
Update is here are the items I think are blocking us from enabling StringView |
I am going to try and make #6906 work now |
Update is: I have an implementation of #6906 and thanks to @goldmedal we have an implementation of #12788 almost ready to test The final piece I know of is #12771 and @Rachelint has a good PR #12809 for that |
Update: we have enough of the pieces implemented thanks to @Rachelint and @goldmedal and @jayzhan211 so I have hacked it together in a branch and am now running the performance tests to try and get a final end to end performance numbers. I think we are (very) close See #12092 (comment) for details |
datafusion.execution.parquet.schema_force_string_view
by defaultschema_force_string_view
) by default
This is basically blocked on the next arrow-rs release apache/arrow-rs#6341 which is blocking #12788 |
Part of #11752
Is your feature request related to a problem or challenge?
As part of #10918, @XiangpengHao has threaded the use of
StringView
through parquet, arrow-rs and then into DataFusionWhen the
datafusion.execution.parquet.schema_force_string_view
option is enabled, the DataFusion Parquet reader will read all Utf8 columns asStringView
instead, which results in significantly faster performance (details TBD but we will write it down in #11603 )However, when initially merged #11667 this setting will be off by default
This ticket tracks what it would take to turn the setting on by default
Describe the solution you'd like
Change the default value of
datafusion.execution.parquet.schema_force_string_view
to trueDescribe alternatives you've considered
Basically we should enable the flag by default and then run some benchmarks to ensure performance doesn't change by too much
Additional context
No response
The text was updated successfully, but these errors were encountered: