Skip to content

Latest commit

 

History

History
363 lines (343 loc) · 32.8 KB

41.0.0.md

File metadata and controls

363 lines (343 loc) · 32.8 KB

Apache DataFusion 41.0.0 Changelog

This release consists of 245 commits from 69 contributors. See credits at the end of this changelog for more information.

Breaking changes:

  • make unparser Dialect trait Send + Sync #11504 (y-f-u)
  • Implement physical plan serialization for csv COPY plans , add as_any, Debug to FileFormatFactory #11588 (Lordworms)
  • Consistent API to set parameters of aggregate and window functions (AggregateExt --> ExprFunctionExt) #11550 (timsaucer)
  • Rename ColumnOptions to ParquetColumnOptions #11512 (alamb)
  • Rename input_type --> input_types on AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs #11666 (lewiszlw)
  • Rename RepartitionExec metric repart_time to repartition_time #11703 (alamb)
  • Remove AggregateFunctionDefinition #11803 (lewiszlw)
  • Skipping partial aggregation when it is not helping for high cardinality aggregates #11627 (korowa)
  • Optionally create name of aggregate expression from expressions #11776 (lewiszlw)

Performance related:

  • feat: Optimize CASE expression for "column or null" use case #11534 (andygrove)
  • feat: Optimize CASE expression for usage where then and else values are literals #11553 (andygrove)
  • perf: Optimize IsNotNullExpr #11586 (andygrove)

Implemented enhancements:

  • feat: Add fail_on_overflow option to BinaryExpr #11400 (andygrove)
  • feat: add UDF to_local_time() #11347 (appletreeisyellow)
  • feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime #11471 (Blizzara)
  • feat: support UDWFs in Substrait #11489 (Blizzara)
  • feat: support unnest in GROUP BY clause #11469 (JasonLi-cn)
  • feat: support COUNT() #11229 (tshauck)
  • feat: consume and produce Substrait type extensions #11510 (Blizzara)
  • feat: Error when a SHOW command is passed in with an accompanying non-existant variable #11540 (itsjunetime)
  • feat: support Map literals in Substrait consumer and producer #11547 (Blizzara)
  • feat: add bounds for unary math scalar functions #11584 (tshauck)
  • feat: Add support for cardinality function on maps #11801 (Weijun-H)
  • feat: support Utf8View type in starts_with function #11787 (tshauck)
  • feat: Expose public method for optimizing physical plans #11879 (andygrove)

Fixed bugs:

  • fix: Fix eq properties regression from #10434 #11363 (suremarc)
  • fix: make sure JOIN ON expression is boolean type #11423 (jonahgao)
  • fix: regexp_replace fails when pattern or replacement is a scalar NULL #11459 (Weijun-H)
  • fix: unparser generates wrong sql for derived table with columns #11505 (y-f-u)
  • fix: make UnKnownColumns not equal to others physical exprs #11536 (jonahgao)
  • fix: fixes trig function order by #11559 (tshauck)
  • fix: CASE with NULL #11542 (Weijun-H)
  • fix: panic and incorrect results in LogFunc::output_ordering() #11571 (jonahgao)
  • fix: expose the fluent API fn for approx_distinct instead of the module #11644 (Michael-J-Ward)
  • fix: dont try to coerce list for regex match #11646 (tshauck)
  • fix: regr_count now returns Uint64 #11731 (Michael-J-Ward)
  • fix: set null_equals_null to false when convert_cross_join_to_inner_join #11738 (jonahgao)
  • fix: Add additional required expression for natural join #11713 (Lordworms)
  • fix: hash join tests with forced collisions #11806 (korowa)
  • fix: collect_columns quadratic complexity #11843 (crepererum)

Documentation updates:

  • Minor: Add link to blog to main DataFusion website #11356 (alamb)
  • Add to_local_time() in function reference docs #11401 (appletreeisyellow)
  • Minor: Consolidate specification doc sections #11427 (alamb)
  • Combine the Roadmap / Quarterly Roadmap sections #11426 (alamb)
  • Minor: Add an example for backtrace pretty print #11450 (goldmedal)
  • Docs: Document creating new extension APIs #11425 (alamb)
  • Minor: Clarify which parquet options are used for reading/writing #11511 (alamb)
  • Support newlines_in_values CSV option #11533 (connec)
  • chore: Minor cleanup simplify_demo() example #11576 (kavirajk)
  • Move Datafusion Query Optimizer to library user guide #11563 (devesh-2002)
  • Fix typo in doc of Partitioning #11612 (waruto210)
  • Doc: A tiny typo in scalar function's doc #11620 (2010YOUY01)
  • Change default Parquet writer settings to match arrow-rs (except for compression & statistics) #11558 (wiedld)
  • Rename functions-array to functions-nested #11602 (goldmedal)
  • Add parser option enable_options_value_normalization #11330 (xinlifoobar)
  • Add reference to #comet channel in Arrow Rust Discord server #11637 (ajmarcus)
  • Extract catalog API to separate crate, change TableProvider::scan to take a trait rather than SessionState #11516 (findepi)
  • doc: why nullable of list item is set to true #11626 (jcsherin)
  • Docs: adding explicit mention of test_utils to docs #11670 (edmondop)
  • Ensure statistic defaults in parquet writers are in sync #11656 (wiedld)
  • Merge string-view2 branch: reading from parquet up to 2x faster for some ClickBench queries (not on by default) #11667 (alamb)
  • Doc: Add Sail to known users list #11791 (shehabgamin)
  • Move min and max to user defined aggregate function, remove AggregateFunction / AggregateFunctionDefinition::BuiltIn #11013 (edmondop)
  • Change name of MAX/MIN udaf to lowercase max/min #11795 (edmondop)
  • doc: Add support for map and make_map functions #11799 (Weijun-H)
  • Improve readme page in crates.io #11809 (lewiszlw)
  • refactor: remove unneed mut for session context #11864 (sunng87)

Other:

  • Prepare 40.0.0 Release #11343 (andygrove)
  • Support NULL literals in where clause #11266 (xinlifoobar)
  • Implement TPCH substrait integration test, support tpch_6, tpch_10, t… #11349 (Lordworms)
  • Fix bug when pushing projection under joins #11333 (jonahgao)
  • Minor: some cosmetics in filter.rs, fix clippy due to logical conflict #11368 (comphead)
  • Update prost-derive requirement from 0.12 to 0.13 #11355 (dependabot[bot])
  • Minor: update dashmap 6.0.1 #11335 (alamb)
  • Improve and test dataframe API examples in docs #11290 (alamb)
  • Remove redundant unalias_nested calls for creating Filter's #11340 (alamb)
  • Enable clone_on_ref_ptr clippy lint on optimizer #11346 (lewiszlw)
  • Update termtree requirement from 0.4.1 to 0.5.0 #11383 (dependabot[bot])
  • Introduce resources_err! error macro #11374 (comphead)
  • Enable clone_on_ref_ptr clippy lint on common #11384 (lewiszlw)
  • Track parquet writer encoding memory usage on MemoryPool #11345 (wiedld)
  • Minor: remove clones and unnecessary Arcs in from_substrait_rex #11337 (alamb)
  • Minor: Change no-statement error message to be clearer #11394 (itsjunetime)
  • Change array_agg to return null on no input rather than empty list #11299 (jayzhan211)
  • Minor: return "not supported" for COUNT DISTINCT with multiple arguments #11391 (jonahgao)
  • Enable clone_on_ref_ptr clippy lint on sql #11380 (lewiszlw)
  • Move configuration information out of example usage page #11300 (alamb)
  • chore: reuse a single function to create the Substrait TPCH consumer test contexts #11396 (Blizzara)
  • refactor: change error type for "no statement" #11411 (crepererum)
  • Implement prettier SQL unparsing (more human readable) #11186 (MohamedAbdeen21)
  • Move overlay planning toExprPlanner #11398 (dharanad)
  • Coerce types for all union children plans when eliminating nesting #11386 (gruuya)
  • Add customizable equality and hash functions to UDFs #11392 (joroKr21)
  • Implement ScalarFunction MAKE_MAP and MAP #11361 (goldmedal)
  • Improve CommonSubexprEliminate rule with surely and conditionally evaluated stats #11357 (peter-toth)
  • fix(11397): surface proper errors in ParquetSink #11399 (wiedld)
  • Minor: Add note about SQLLancer fuzz testing to docs #11430 (alamb)
  • Trivial: use arrow csv writer's timestamp_tz_format #11407 (tmi)
  • Improved unparser documentation #11395 (alamb)
  • Avoid calling shutdown after failed write of AsyncWrite #11415 (joroKr21)
  • Short term way to make AggregateStatistics still work when min/max is converted to udaf #11261 (Rachelint)
  • Implement TPCH substrait integration test, support tpch_13, tpch_14,16 #11405 (Lordworms)
  • Minor: fix giuthub action labeler rules #11428 (alamb)
  • Minor: change internal error to not supported error for nested field … #11446 (alamb)
  • Minor: change Datafusion --> DataFusion in docs #11439 (alamb)
  • Support serialization/deserialization for custom physical exprs in proto #11387 (lewiszlw)
  • remove termtree dependency #11416 (Kev1n8)
  • Add SessionStateBuilder and extract out the registration of defaults #11403 (Omega359)
  • integrate consumer tests, implement tpch query 18 to 22 #11462 (Lordworms)
  • Docs: Explain the usage of logical expressions for create_aggregate_expr #11458 (jayzhan211)
  • Return scalar result when all inputs are constants in map and make_map #11461 (Rachelint)
  • Enable clone_on_ref_ptr clippy lint on functions* #11468 (lewiszlw)
  • minor: non-overlapping repart_time and send_time metrics #11440 (korowa)
  • Minor: rename row_groups.rs to row_group_filter.rs #11481 (alamb)
  • Support alternate formats for unparsing datetime to timestamp and interval #11466 (y-f-u)
  • chore: Add criterion benchmark for CaseExpr #11482 (andygrove)
  • Initial support for StringView, merge changes from string-view development branch #11402 (alamb)
  • Replace to_lowercase with to_string in sql example #11486 (lewiszlw)
  • Minor: Make execute_input_stream Accessible for Any Sinking Operators #11449 (berkaysynnada)
  • Enable clone_on_ref_ptr clippy lints on proto #11465 (lewiszlw)
  • upgrade sqlparser 0.47 -> 0.48 #11453 (MohamedAbdeen21)
  • Add extension hooks for encoding and decoding UDAFs and UDWFs #11417 (joroKr21)
  • Remove element's nullability of array_agg function #11447 (jayzhan211)
  • Get expr planners when creating new planner #11485 (jayzhan211)
  • Support alternate format for Utf8 unparsing (CHAR) #11494 (sgrebnov)
  • implement retract_batch for xor accumulator #11500 (drewhayward)
  • Refactor: more clearly delineate between TableParquetOptions and ParquetWriterOptions #11444 (wiedld)
  • chore: fix typos of common and core packages #11520 (JasonLi-cn)
  • Move spill related functions to spill.rs #11509 (findepi)
  • Add tests that show the different defaults for ArrowWriter and TableParquetOptions #11524 (wiedld)
  • Create datafusion-physical-optimizer crate #11507 (lewiszlw)
  • Minor: Assert test_enabled_backtrace requirements to run #11525 (comphead)
  • Move handlign of NULL literals in where clause to type coercion pass #11491 (xinlifoobar)
  • Update parquet page pruning code to use the StatisticsExtractor #11483 (alamb)
  • Enable SortMergeJoin LeftAnti filtered fuzz tests #11535 (comphead)
  • chore: fix typos of expr, functions, optimizer, physical-expr-common,… #11538 (JasonLi-cn)
  • Minor: Remove clone in PushDownFilter #11532 (jayzhan211)
  • Minor: avoid a clone in type coercion #11530 (alamb)
  • Move array ArrayAgg to a UserDefinedAggregate #11448 (jayzhan211)
  • Move MAKE_MAP to ExprPlanner #11452 (goldmedal)
  • chore: fix typos of sql, sqllogictest and substrait packages #11548 (JasonLi-cn)
  • Prevent bigger files from being checked in #11508 (findepi)
  • Add dialect param to use double precision for float64 in Postgres #11495 (Sevenannn)
  • Minor: move SessionStateDefaults into its own module #11566 (alamb)
  • refactor: rewrite mega type to an enum containing both cases #11539 (LorrensP-2158466)
  • Move sql_compound_identifier_to_expr to ExprPlanner #11487 (dharanad)
  • Support SortMergeJoin spilling #11218 (comphead)
  • Fix unparser invalid sql for query with order #11527 (y-f-u)
  • Provide DataFrame API for map and move map to functions-array #11560 (goldmedal)
  • Move OutputRequirements to datafusion-physical-optimizer crate #11579 (xinlifoobar)
  • Minor: move Column related tests and rename column.rs #11573 (jonahgao)
  • Fix SortMergeJoin antijoin flaky condition #11604 (comphead)
  • Improve Union Equivalence Propagation #11506 (mustafasrepo)
  • Migrate OrderSensitiveArrayAgg to be a user defined aggregate #11564 (jayzhan211)
  • Minor:Disable flaky SMJ antijoin filtered test until the fix #11608 (comphead)
  • support Decimal256 type in datafusion-proto #11606 (leoyvens)
  • Chore/fifo tests cleanup #11616 (ozankabak)
  • Fix Internal Error for an INNER JOIN query #11578 (xinlifoobar)
  • test: get file size by func metadata #11575 (zhuliquan)
  • Improve unparser MySQL compatibility #11589 (sgrebnov)
  • Push scalar functions into cross join #11528 (lewiszlw)
  • Remove ArrayAgg Builtin in favor of UDF #11611 (jayzhan211)
  • refactor: simplify DFSchema::field_with_unqualified_name #11619 (jonahgao)
  • Minor: Use upstream concat_batches from arrow-rs #11615 (alamb)
  • Fix : signum function bug when 0.0 input #11580 (getChan)
  • Enforce uniqueness of named_struct field names #11614 (dharanad)
  • Minor: unecessary row_count calculation in CrossJoinExec and NestedLoopsJoinExec #11632 (alamb)
  • ExprBuilder for Physical Aggregate Expr #11617 (jayzhan211)
  • Minor: avoid copying order by exprs in planner #11634 (alamb)
  • Unify CI and pre-commit hook settings for clippy #11640 (findepi)
  • Parsing SQL strings to Exprs with the qualified schema #11562 (Lordworms)
  • Add some zero column tests covering LIMIT, GROUP BY, WHERE, JOIN, and WINDOW #11624 (Kev1n8)
  • Refactor/simplify window frame utils #11648 (ozankabak)
  • Minor: use ready! macro to simplify FilterExec #11649 (alamb)
  • Temporarily pin toolchain version to avoid clippy errors #11655 (findepi)
  • Fix clippy errors for Rust 1.80 #11654 (findepi)
  • Add CsvExecBuilder for creating CsvExec #11633 (connec)
  • chore(deps): update sqlparser requirement from 0.48 to 0.49 #11630 (dependabot[bot])
  • Add support for USING to SQL unparser #11636 (wackywendell)
  • Run CI with latest (Rust 1.80), add ticket references to commented out tests #11661 (alamb)
  • Use AccumulatorArgs::is_reversed in NthValueAgg #11669 (jcsherin)
  • Implement physical plan serialization for json Copy plans #11645 (Lordworms)
  • Minor: improve documentation on SessionState #11642 (alamb)
  • Add LimitPushdown optimization rule and CoalesceBatchesExec fetch #11652 (alihandroid)
  • Update to arrow/parquet 52.2.0 #11691 (alamb)
  • Minor: Rename RepartitionMetrics::repartition_time to RepartitionMetrics::repart_time to match metric #11478 (alamb)
  • Update cache key used in rust CI script #11641 (findepi)
  • Fix bug in remove_join_expressions #11693 (jonahgao)
  • Initial changes to support using udaf min/max for statistics and opti… #11696 (edmondop)
  • Handle nulls in approx_percentile_cont #11721 (Dandandan)
  • Reduce repetition in try_process_group_by_unnest and try_process_unnest #11714 (JasonLi-cn)
  • Minor: Add example for ScalarUDF::call #11727 (alamb)
  • Use cargo release in bench.sh #11722 (alamb)
  • expose some fields on session state #11716 (waynexia)
  • Make DefaultSchemaAdapterFactory public #11709 (adriangb)
  • Check hashes first during probing the aggr hash table #11718 (Rachelint)
  • Implement physical plan serialization for parquet Copy plans #11735 (Lordworms)
  • Support cross-timezone timestamp comparison via coercsion #11711 (jeffreyssmith2nd)
  • Minor: Improve documentation for AggregateUDFImpl::state_fields #11740 (lewiszlw)
  • Do not push down Sorts if it violates the sort requirements #11678 (alamb)
  • Use upstream StatisticsConverter from arrow-rs in DataFusion #11479 (alamb)
  • Fix plan_to_sql: Add wildcard projection to SELECT statement if no projection was set #11744 (LatrecheYasser)
  • Use upstream DataType::from_str in arrow-cast #11254 (alamb)
  • Fix documentation warnings, make CsvExecBuilder and Unparsed pub #11729 (alamb)
  • [Minor] Add test for only nulls (empty) as input in APPROX_PERCENTILE_CONT #11760 (Dandandan)
  • Add TrackedMemoryPool with better error messages on exhaustion #11665 (wiedld)
  • Derive Debug for logical plan nodes #11757 (lewiszlw)
  • Minor: add "clickbench extended" queries to slt tests #11763 (alamb)
  • Minor: Add comment explaining rationale for hash check #11750 (alamb)
  • Fix bug that COUNT(DISTINCT) on StringView panics #11768 (XiangpengHao)
  • [Minor] Refactor approx_percentile #11769 (Dandandan)
  • minor: always time batch_filter even when the result is an empty batch #11775 (andygrove)
  • Improve OOM message when a single reservation request fails to get more bytes. #11771 (wiedld)
  • [Minor] Short circuit ApplyFunctionRewrites if there are no function rewrites #11765 (gruuya)
  • Fix #11692: Improve doc comments within macros #11694 (Rafferty97)
  • Extract CoalesceBatchesStream to a struct #11610 (alamb)
  • refactor: move ExecutionPlan and related structs into dedicated mod #11759 (waynexia)
  • Minor: Add references to github issue in comments #11784 (findepi)
  • Add docs and rename param for Signature::numeric #11778 (matthewmturner)
  • Support planning Map literal #11780 (goldmedal)
  • Support LogicalPlan Debug differently than Display #11774 (lewiszlw)
  • Remove redundant Aggregate when DISTINCT & GROUP BY are in the same query #11781 (mertak-synnada)
  • Minor: add ticket reference and fmt #11805 (alamb)
  • Improve MSRV CI check to print out problems to log #11789 (alamb)
  • Improve log func tests stability #11808 (lewiszlw)
  • Add valid Distinct case for aggregation #11814 (mertak-synnada)
  • Don't implement create_sliding_accumulator repeatedly #11813 (lewiszlw)
  • chore(deps): update rstest requirement from 0.21.0 to 0.22.0 #11811 (dependabot[bot])
  • Minor: Update exected output due to logical conflict #11824 (alamb)
  • Pass scalar to eq inside nullif #11697 (simonvandel)
  • refactor: move aggregate_statistics to datafusion-physical-optimizer #11798 (Weijun-H)
  • Minor: refactor probe check into function should_skip_aggregation #11821 (alamb)
  • Minor: consolidate path_partition test into core_integration #11831 (alamb)
  • Move optimizer integration tests to core_integration #11830 (alamb)
  • Bump deprecated version of SessionState::new_with_config_rt to 41.0.0 #11839 (kezhuw)
  • Fix partial aggregation skipping with Decimal aggregators #11833 (alamb)
  • Fix bug with zero-sized buffer for StringViewArray #11841 (XiangpengHao)
  • Reduce clone of Statistics in ListingTable and PartitionedFile #11802 (Rachelint)
  • Add LogicalPlan::CreateIndex #11817 (lewiszlw)
  • Update object_store to 0.10.2 #11860 (danlgrca)
  • Add skipped_aggregation_rows metric to aggregate operator #11706 (alamb)
  • Cast Utf8View to Utf8 to support || from StringViewArray #11796 (dharanad)
  • Improve nested loop join code #11863 (lewiszlw)
  • [Minor]: Refactor to use Result.transpose() #11882 (djanderson)
  • support ANY() op #11849 (samuelcolvin)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    48	Andrew Lamb
    20	张林伟
     9	Jay Zhan
     9	Jonah Gao
     8	Andy Grove
     8	Lordworms
     8	Piotr Findeisen
     8	wiedld
     7	Oleks V
     6	Jax Liu
     5	Alex Huang
     5	Arttu
     5	JasonLi
     5	Trent Hauck
     5	Xin Li
     4	Dharan Aditya
     4	Edmondo Porcu
     4	dependabot[bot]
     4	kamille
     4	yfu
     3	Daniël Heres
     3	Eduard Karacharov
     3	Georgi Krastev
     2	Chris Connelly
     2	Chunchun Ye
     2	June
     2	Marco Neumann
     2	Marko Grujic
     2	Mehmet Ozan Kabak
     2	Michael J Ward
     2	Mohamed Abdeen
     2	Ruihang Xia
     2	Sergei Grebnov
     2	Xiangpeng Hao
     2	jcsherin
     2	kf zheng
     2	mertak-synnada
     1	Adrian Garcia Badaracco
     1	Alexander Rafferty
     1	Alihan Çelikcan
     1	Ariel Marcus
     1	Berkay Şahin
     1	Bruce Ritchie
     1	Devesh Rahatekar
     1	Douglas Anderson
     1	Drew Hayward
     1	Jeffrey Smith II
     1	Kaviraj Kanagaraj
     1	Kezhu Wang
     1	Leonardo Yvens
     1	Lorrens Pantelis
     1	Matthew Cramerus
     1	Matthew Turner
     1	Mustafa Akur
     1	Namgung Chan
     1	Ning Sun
     1	Peter Toth
     1	Qianqian
     1	Samuel Colvin
     1	Shehab Amin
     1	Simon Vandel Sillesen
     1	Tim Saucer
     1	Wendell Smith
     1	Yasser Latreche
     1	Yongting You
     1	danlgrca
     1	tmi
     1	waruto
     1	zhuliquan

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.