-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dec 13, 2024: This week(s) in DataFusion #13760
Comments
@Omega359 made some signficant progress towards running the sqllogictest suite from sqlite against datafusion -- see this ticket for more: |
I have also started collecting a list of things to improve dev time / experience |
Fun project: dbfiddle for DataFusion in browser: |
Maybe time to ressurect discussion about stable / long term releases Is anyone else interested in this? We would likely need help with the additional maintenance burden |
@suremarc is cranking along with a materialized view implementation 🚀 |
|
Next week: #13970 |
Introduction
This ticket is a weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please feel free to leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to #13630
Loosely inspired by https://this-week-in-rust.org/
Reminder, find new content (and please post some!) to
Community Highlights
Theme: DataFusion Fever is Spreading
I think DataFusion is reaching an inflection point: It is now good enough that more than early adopters can and are building real, production systems using DataFusion. This is a great milestone 🎉 and I think the project is adjusting to this new reality.
One major theme we have been discussing in the last week or two is making upgrades easier. The recent pushes in
43.0.0
and44.0.0
to clean up / complete projects such as StringView, window function migration, improved APIs, etc have caused significant downstream complications upgrading. Going forward as a community, we are discussing ways to improve this process.I hope to write more on this topic
You can read more about this here
New Blog Website
Releases
0.53.0
/ sqlparser_derive0.3.0
datafusion-sqlparser-rs#151754.0.0
(December 2024) arrow-rs#634244.0.0
#13334Performance
The community loves a good benchmark challenge. We are off to a great start making h20 benchmark even faster, see
corr
more than 3x faster 🚀median
doing Improve performance ofmedian
function #13550I also also made a change with example to allow array reuse in functions, which adds to the 🚀 🔨 🧰
ScalarFunctionArgs
gets ownedColumnReference
) #13637Thanks @dhegberg for a CSV loading benchmark
Also thanks to @richox, @Zhangli20, @tlm365 @jayzhan211 @Weijun-H, @comphead and @Dandandan for improving the speed of other functions
initcap
function (~2x faster) #13691character_length
function #13696Vec
withIndexMap
for expression mappings inProjectionMapping
andEquivalenceGroup
#13675🐛 fixes, and improvements
@onursatici has been on a tear along with @korowa @haohuaijin
fix union serialisation order in proto #13709
fix: repartitioned reads of CSV with custom line terminator #13677
Fix hash join with sort push down #13560
Thanks to @Eason0729 for Handle alias when parsing sql(parse_sql_expr) #12939
@findepi @jonahgao @comphead and others have been cleaning up the code 🧹
OnceLock
withLazyLock
#13641make_udf_function
macro #13712Unparser
We have been cranking away filling out plan --> SQL feature, thanks to @goldmedal
UNNEST
plan toUNNEST
table factor SQL #13660Hashbrown
@crepererum has been working to migrate our use of hashbrown to higher level APIs
hashbrown::raw::RawTable
tohashbrown::hash_table::HashTable
#13433hashbrown
RawTable
uses toHashTable
(round 3) #13658Looking to get more involved? Try code review!
(can you see what I did there 🎣 )
DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.
We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try
@
mentioning one of the committers.Help wanted
Please feel leave your own comments on this ticket if you are looking for help
Community
Upcoming meetups:
The text was updated successfully, but these errors were encountered: